The PolMine project relies heavily on the statistical programming language R. We believe it is a prudent choice: R is great for interactive data analysis and for data visualisation, particularly when used in combination with RStudio as an integrated development environment (IDE).
R is one of the more accessible programming languages. Writing code yourself means gaining a flexibility to implement ideas and workflows that cannot be attained when being an end-user of a graphical user interface. This comes at a cost: Using R may be a thorny experience at first. As we want to make corpora a useful and productive resource for research, offering tutorials, recipies and how-tos is an integral part of the PolMine Project.
The Using Corpora and Social Science Research (UCSSR) series of teaching and learning resources shall successively meet the needs of users of polmineR, and corpora such as GermaParl.
Using the Corpus Workbench (CWB) as a backend for corpus analysis is a fundamental design choice. A thorough understanding of the syntax of the Corpus Query Processor (CQP) may be necessary to make full use of linguistically annotated corpora, but explaining the details of the CQP syntax goes beyond the scope of the documentation of our packages. The CQP tutorial is the traditional resource for learning how to use CQP. If you plan to prepare corpora yourself, the CQP encoding tutorial will include technical explanations that may be very helpful. Another system that uses the CWB as a backend is CQPweb. A series of CQPweb tutorials at You Tube conveys many valuable insights how CQP can be used.
Then there is R itself. There are various resources you can use to learn R, such as swirl. But using R productively will involve more than just learning to write R code.
- First, as mentioned initially, using the RStudio IDE makes using R much more comfortable than working with R at the command line. The webinar series RStudio Essentials offers a good introduction to using the RStudio IDE.
- Second, for doing analyses, writing Rmarkdown documents is a great approach. Rmarkdown offers the opportunity to combine code and text, and are a good way to realize the ideas of literate programming and reproducible research.
- Third, our code is under version control and we recommend to start using git as a version control system early on, particularly when you work in a collaborative setting.
Finally, a resource we would like to mention is R bloggers. There is always a wealth of fresh ideas the blogs convey!