cwbtools v0.3.3 ‘Hemicycle’ brings Europarl closer.
As an immediate follow-up to cwbtools v0.3.2 “Il Postino”, a new cwbtools version (v0.3.3, “Hemicycle”) just made it to CRAN. This has become necessary to fix errors with the Solaris and Fedora test environments of CRAN that occurred because v0.3.2 expanded test coverage.
Changes to meet CRAN requirements will not be relevant for most users. So why bother to install the new release? There is one nice new feature: Assumptions made by
corpus_install(), the mechanism for installing corpora, have been relaxed as follows: The binary data directory in a corpus tarball may now also be named “data” (not only “indexed_corpora”) and binary files need not reside in a subdirectory named after the corpus. They can be in the data directory directly.
Access to two classic corpora - the Europarl corpus with debates in the European parliament and the Dickens corpus with Charles Dickens’ novels - is super-easy now: Simply use the following code to download and install both corpora.
library(cwbtoos) dickens <- "http://cwb.sourceforge.net/temp/Dickens-1.0.tar.gz" corpus_install(tarball = dickens) ep <- "http://corpora.linguistik.uni-erlangen.de/demos/download/Europarl3-CWB-2010-02-28.tar.gz" corpus_install(tarball = ep)
A second note on this release: The functionality that has evolved to install corpora from Web storage, from a repository such as Zenodo and from Amazon S3 supersedes previous ideas to ship (large) corpora in R data packages. The big disadvantage of using R data packages for data dissemination is the need to re-install corpora every time there is a major R update. So cwbtools v0.3.3 makes initial steps to declare functionality as deprecated that assists users to wrap corpora in packages. In this vein, the previous Europarl-vignette that explained how to wrap the Europarl corpus into an R data package has been dropped.
But functionality to download and install Europarl from the Web is there! A nice thing about Europarl is that it is a multilingual ressource, making it relevant for many international users. As a tribute to the European Parliament (EP), this release is labelled “Hemicycle”: This is how the plenary chamber of the EP at Strasbourg is called, following a tradition to arrange seats and parliamentary groups in a circular shape.
Subscribe via RSS