5 million scans VOC archives online and searchable

Colonial history at your fingertips with one click

Amsterdam, 2 October 2023 – On Wednesday 4 October, the GLOBALISE project of the Huygens Institute and several partners will make the five million scans from the archives of the Dutch East India Company digitally available, which have been transcribed using automatic handwritten text recognition. This will make the information in these archives on such fraught topics as slavery and colonial violence many times easier to search and read. And that will lead to new perspectives on the colonial past.

Since 2003, the VOC archives have been recognised as UNESCO Memory of the World. The part of the archives thus made accessible, known as the Overgekomen Brieven en Papieren, contains detailed information about the VOC’s actions in the seventeenth and eighteenth centuries. The documents testify to Dutch colonial history in Asia. They are also important for the local historiography of societies with which the VOC came into contact.

New light on the history of colonialism and Asia

Text recognition will lead to new perspectives on the colonial past. Matthias van Rossum, project leader (IISG): ‘The searchability makes information in the archives much more complete and faster to find. For example, the consequences of the terrible depopulation of the island of Liuqiu near Taiwan by the VOC in 1636 can be found at the click of a button. We know that the VOC was not only a trading company but also acted as a colonial government. But far too little research has been done on that. These archives show how the VOC acted against colonised and non-European societies.’

The archives contain information about lots of people, places and events in a huge area. That offers opportunities for all kinds of research. And not only in the Netherlands, but especially worldwide. Manjusha Kuruppath, researcher (Huygens Institute): ‘You can do genealogical research and look for references to people. Or to local histories, for instance in Indonesia or India, by looking for information about events, places or communities. You can even find information about phenomena such as disease and climate, which in turn can be of great interest for world historiography.’

This is only the first version of transcripts in a simple search environment. There is much more to come. Lodewijk Petram, project manager (Huygens Institute): ‘We find it very important that this data is available to as many people as possible as soon as possible. We expect to come up with an improved version by the end of the year. We do not limit our work only to text recognition. We are also bringing together data that gives contextual information to certain terms. And we are working on a search interface that supports new research methods. This will also make the VOC archives more accessible to an international audience.’

Text recognition for world heritage

The text recognition was made with the KNAW Humanities Cluster’s open-source Loghi transcription software. Recently, the VOC archives in the Netherlands, Indonesia and Sri Lanka, among others, have been largely digitised. The GLOBALISE project was awarded a grant from the Netherlands Organisation for Scientific Research (NWO) in 2021 to develop a digital research infrastructure for part of these archives.

GLOBALISE is a project of the Huygens Institute in cooperation with the International Institute of Social History, the National Archive, the Digital Infrastructure Department of the KNAW Humanities Cluster, the Vrije Universiteit Amsterdam and the University of Amsterdam.