Data Scopes

Many large digital text collections and computational tools that are available online today, allow humanities scholars to address a wide array of research questions. This often involves many data transformations: gathering and selecting documents, extracting and modelling the relevant data in them, cleaning and normalising this data, and linking dispersed information both within the resource under study and to external resources. All these transformations change the nature of the data and require interpretation to make informed choices. We are developing Data Scopes as an instrument to make this transformation process transparent and explicitly linked to the research questions.

The Data Scopes instrument has several components:

Multi-layered publication

With computational humanities research, the research process has changed, but how is this reflected in how we publish our research? A traditional historical narrative focuses on how connecting the evidence from dispersed documents gives us new perspectives and insights, but leaves out the many data processing steps that were taken. One question Data Scopes aims to address is how we can publish digital research in multiple layers connecting the narrative with the data and processing.

Annotation layers

Each corpus has its own characteristics, which determines what kinds of information we can extract from it and how we can organise these into layers of annotations:

Data scale

Different amounts of data require different ways of accessing and processing them.

To develop and test this instrument, we apply to a number of projects, including REPUBLIC (a corpus of political historical documents like meeting notes, decisions, Ordonnances), Migrant: Mobilities and Connection (which includes institutions, policy archives, card indexes, registers and (distributed) databases), and Impact of Fiction (focussing on book reviews and discussions).