Digital forensics for historical documents

Duration: June 2018 - December 2022
Subsidy provider: KNAW Institutes Research Fund
Subsidy size: 500,000 euro
Remarkable: More and more digital images of handwritten sources are available online. The widely embraced standard for providing and processing them, IIIF, offers the possibility of considering this material as Big Data, and using AI to develop new research methods.
Valorisation: In this experimental project, two datasets of historical handwritten material were made suitable for different research questions. The output is not so much a new software tool as insight into the processes and data cleaning required to building a deep-learning system for this material.

In the project ‘Digital forensics for historical documents. Cracking cold cases with new technology’, funded by the Research Fund of the Royal Dutch Academy of Arts and Sciences, Huygens Institute and IISH collaborate. Techniques of digital image analysis are used to create new ways to analyse samples of historical script. The project started in June 2018, and ended in December 2022. The dissertation that is part of the project’s output is still work in progress.

Palaeografical analyses with a forensic method

The project Digital Forensics aims at creating a bridge in between two different ways of handwriting analyses: the forensic method (graphanalysis) and the palaeographical method (the study of the development of scripts through space and time). In forensic research, analysis wishes to determine to establish a unique profile of script per individual, in order to conclude who wrote which text. In palaeography, the study of old scripts, the aim is to establish when and where something was written. In Digital Forensics, the two methods will be combined in a single ‘deep learning system’: we will train a software system to see similarities and deviances in script with training datasets, so that it will be able to match similar script samples in a palaeographically meaningful way. This method is now possible for the first time because of two recent, powerful developments: the mass digitization of medieval manuscript images and the embracing of a shared format for these images: IIIF (International Image Interoperability Framework).

Two projects

The project has two subprojects. In one project the analysis is aimed at identifying individual hands (‘who wrote which text?’). This project is led by Matthias van Rossum (IISH) and works with material form the administration of the Dutch East Indies Company. In the other project a new method is created to analyse the dating and location of medieval scripts (‘what was written when and where?’). This project is led by Mariken Teeuwen (Huygens Institute), with Hannah Busch (now researcher at the Cologne Institute for eHumanities) as a PhD student. Both projects rest firmly on the work of Rutger van Koert (Department of Digital Infrastructure of the KNAW Humanities Cluster), who builds the software system for the analyses. A team of internal and external partners, furthermore, are involved in giving feedback and advice, both on the basis of their historical and on their technical expertise.