There is an unprecedented availability of digitized, politically relevant text. This opens up new horizons for social science research. Turning text into corpora and acquiring abilities to work productively with vast amounts of textual data will stimulate research on old and new research questions in the social sciences. Providing the data and the code to exploit the opportunities of digitalization for our discipline is the purpose of the PolMine Project.


The formula “code is theory” drives what we develop. Valid research findings require to combine qualitative and quantitative analytical steps seamlessly in an interactive workflow. We see text as linguistic data. These ideas are implemented using the statistical programming language R. The R package polmineR is our core package for text analysis and is complemented by packages for corpus preparation.


Our focus is to turn texts issued by public institutions into language resources for research. A digital public archive of democracy is the ultimate vision. GermaParl, a corpus of parliamentary protocols is our flagship corpus. We strive for a sustainable research data management that includes a fully reproducible data preparation workflow, work with standardized, TEI-compatible data formats, and involve users for quality management.


Whether you call it computer-assisted content analysis, text mining, computational social science, or digital humanities: There are new techniques and methodologies students and researchers need to acquire. To make productive use of new technologies in our digital era-research practices, we develop the ‘Using Corpora in Social Science Research (UCSSR)’-series of teaching and learning materials, currently a set of slides that is available via GitHub Pages.


Our substantive research is about the politics and policies in pluralist immigration societies. The PolMine Project has benefitted from research projects on associations in the migration policy domain, parliamentarians with a migration background, and changing policy agendas and attention structures related to integration policy. In a sense, code and data are a spin-off from this research, to make resources re-usable and findings reproducible.


The PolMine Project is an initiative of Andreas Blätte, Professor of Public Policy and Regional Politics. The members of the PolMine Research Group contribute to the project either by contributing code and data, or by actively using resources and giving feedback. PolMine is affiliated with the NRW School of Governance at the Institute of Political Science (Department of the Social Sciences, University of Duisburg-Essen). PolMine is a registered category-C centre in the CLARIN network.