Many large digital text collections and computational tools that are available online today, allow humanities scholars to address a wide array of research questions. This often involves many data transformations: gathering and selecting documents, extracting and modelling the relevant data in them, cleaning and normalising this data, and linking dispersed information both within the resource under study and to external resources. All these transformations change the nature of the data and require interpretation to make informed choices. We are developing Data Scopes as an instrument to make this transformation process transparent and explicitly linked to the research questions.
The Data Scopes instrument has several components:
With computational humanities research, the research process has changed, but how is this reflected in how we publish our research? A traditional historical narrative focuses on how connecting the evidence from dispersed documents gives us new perspectives and insights, but leaves out the many data processing steps that were taken. One question Data Scopes aims to address is how we can publish digital research in multiple layers connecting the narrative with the data and processing.
Each corpus has its own characteristics, which determines what kinds of information we can extract from it and how we can organise these into layers of annotations:
- How do the characteristics of a corpus translate to the types of information that can be used to organise and give access to the corpus?
- How do we translate potential research questions into relevant annotation layers and data queries for a corpus?
- How do we give insight into what kinds of layers of annotations and research questions are possible given the data?
Different amounts of data require different ways of accessing and processing them.
- How do the needs for data elaboration change with different amounts of data?
- Which methods are effective when we have dozens of documents, or if we have thousands or millions of documents? How do we translate qualitative methods for small amounts of data to quantitative methods for larger amounts of data? How do we translate methods that rely on statistics derived from hundreds of thousands of documents to a corpus that only has a few thousand documents?
- How can we connect micro and macro perspectives,at what scale do we switch perspectives and why? Where are the transitions in relevant aspects and patterns?
- How can we iteratively zoom in and out for switching between micro and macro and between qualitative and quantitative?
To develop and test this instrument, we apply to a number of projects, including REPUBLIC (a corpus of political historical documents like meeting notes, decisions, Ordonnances), Migrant: Mobilities and Connection (which includes institutions, policy archives, card indexes, registers and (distributed) databases), and Impact of Fiction (focussing on book reviews and discussions).