The Corpus Workbench (CWB) is a classic indexing and query engine to efficiently work with large, linguistically annotated corpora. The cwbtools package offers a set of tools to conveniently create, modify and manage CWB indexed corpora from within R. It complements R packages that use the CWB as a backend for text mining with R, namely the RcppCWB package for low-level access to CWB indexed corpora, and polmineR as a toolset to implement common text mining workflows.
The package is available via CRAN and can be installed as follows on Windows, macOS and Linux.
To install the development version of the package, use the installation mechanism offered by the remotes package. On Windows, an installation of Rtools may be necessary.
# Make sure the remotes package is present if (!"remotes" %in% installed.packages()[,"Package"]) install.packages("remotes") Sys.setenv(R_REMOTES_STANDALONE = "true") remotes::install_github("PolMine/cwbtools", ref = "dev", force = TRUE)
The default approach to install the development version
cwbtools from GitHub would be
devtools::install_github("PolMine/cwbtools", ref = "dev"). However, the concurrent dependency of
devtools and of
cwbtools on the
curl package may cause nerve-wrecking problems if
curl can be updated: If a newer version of
curl is available, the user will be prompted whether this update is desired. Most users will agree. However, this update will fail because
curl is loaded by
devtools, and parts of the
curl package cannot be deleted/updated (the dynamic library that is loaded).
To avoid having to perform manual updates in the correct order, using the original
install_github() function of the
remotes package is recommended. When setting the environment variable
remotes package will rely on a minimal set of additional packages. The aforementioned situation that may make the installation of
cwbtools difficult for most users is omitted.