Chapter 1 Introduction

1.1 The MigTex Project

MigParl is an indexed and linguistically annotated corpus of speeches on migration and integration affairs in Germany’s regional parliaments (“Landtage”). The corpus has been prepared in the MigTex Project (principal investigators: Andreas Blätte / University of Duisburg-Essen, Ruud Koopmans / Berlin Social Science Center), using the resources and the infrastructure of the PolMine Project.

MigTex was part of a larger joint project to establish the research community of the German Centre for Integration and Migration Research (Deutsches Zentrum für Integrations- und Migrationsforschung / DeZIM). Funding awarded by Germany’s Federal Ministry for Family Affairs, Senior Citizens, Women and Youth (Bundesministerium für Familie, Senioren, Frauen und Jugend / BMFSFJ) is gratefully acknowleged.

1.2 The MigParl Corpus

The MigParl corpus, a comprehensively annotated corpus of plenary speech follows in the footsteps of the GermaParl corpus, a corpus of plenary protocols of the German Bundestag, which was developed in the PolMine Project. As such, it shares GermaParl’s motivation and general purpose (see Blätte and Blessing 2018: 810). As the use cases and requirements of a corpus of plenary protocols are described in depth in Blätte and Blessing (2018), the main focus of this document is the description of the specificities of MigParl itself.

The MigParl corpus comprises the time between (mostly) January 2000 to December 2018. It consists of selected parliamentary speeches held in the German regional state parliaments which are relevant for migration and integration research. While 14 of the 16 regional states do provide data for roughly this time period, the regional states of Rhineland-Palatinate (before May 2001) and Saarland (before September 2004) do not provide processable protocols for the entire period. As a thematically specialized corpus, MigParl does not contain all debates, but only those speeches which are relevant for migration and integration research. The following chapter “Corpus Preparation and Selection Strategy” provides information on the corpus preparation and the selection strategy used.

MigParl is made available as a linguistically annotated, CWB-indexed corpus. The annotation layers of the current version of MigParl are presented in the subsequent section “Data Overview” as well as in a corresponding data report chapter 2. The corpus is provided in form of a tar archive which is stored in the open-access repository Zenodo. From there, it can be downloaded to be used either directly within the Corpus Workbench or in the polmineR analysis environment. The chapter “Using MigParl” will elaborate on the second option.

Over the course of the project, three versions of the corpus were presented at different points of time. These versions differ in terms of preparation and data quality as well as in substantial content of the corpora. Taking into account sustainable and reproducible research, all three versions of the MigParl corpus are available still. The following presentation of the data refers to the final version of the corpus (version 2020.01.27). We provide data reports as well as preparation documentation for all versions of the corpus. See the annex for the older versions of the corpus.


Blätte, Andreas, and Andre Blessing. 2018. “The Germaparl Corpus of Parliamentary Protocols.” In Proceedings of the Eleventh International Conference on Language Resources and Evaluation (Lrec 2018), edited by (Conference chair)Nicoletta Calzolari, Khalid Choukri, Christopher Cieri, Thierry Declerck, Sara Goggi, Koiti Hasida, Hitoshi Isahara, et al. Miyazaki, Japan: European Language Resources Association (ELRA).