New Project ‘Linking Textual Data’ started.

Our newest project “Linking Textual Data” is part of the Consortium for the Social, Behavioral, Educational and Economic Sciences (KonsortSWD) and contributes to the National Research Data Infrastructure (NFDI).

In this new project that has officially started on Januar 1st 2021, we develop tools for linking different data types in order to open up new avenues for research. Corpora have become an important data type in the social sciences. Methods for analysing text are developing rapidly. A crucial aspect of the huge potential of large-scale corpora for the social sciences are capabilities to link the analysis of text and other data types, such as surveys. Yet the barriers to data linkage are still high. The tools and workflows we will develop and share shall improve our abilities to gain new insights through the combination of different data types. Not surprisingly, corpora play a central role in our plans.

The NFDI is the broader context of our endeavour. It aims to secure and utilize research data in a systematic and sustainable way. To reach this goal, the NFDI plans to establish the management of research data according to the “FAIR” principles. These principles mandate that research data are findable, accessible, interoperable and reusable for and by the scientific community. Furthermore, the NFDI aims to connect to international initiatives with similar aims such as the European Open Science Cloud (EOSC). The efforts of the NFDI are not limited to particular disciplines, they include consortia as diverse as chemistry, culture, health, engineering and many more.

Within the NFDI, the social sciences are represented through KonsortSWD. Specifically, KonsortSWD strengthens, widens and deepens a research data infrastructure for the social, educational, behavioural and economic sciences. Again, the FAIR principles offer guidance: KonsortSWD wants to provide the community (including research data centres) with adequate tools to share and manage data accordingly. Apart from community engagement, data access, ethics and technical solutions, another important task concerns the production of data. This is where our project is located. More specifically, our aim is to contribute to new insights by developing tools for data linkage.

All efforts to utilize research data need to serve the community! Are there any best practices, opportunities and limits? What does the community need from such an infrastructure to be able to use it and profit from it? Community involvement is thus the key to successful data management. If you have any ideas or input how your projects could profit from “Linking Textual Data”, feel free to reach out to us.