The Corpus Workbench (CWB) stores the binary files for
structural and positional attributes in an individual 'data directory'
(referred to by argument
data_dir) for each corpus. The data
directories will typically be subdirectories of a parent directory called
'corpus directory' (argument
corpus_dir). Irrespective of the
location of the data directories, all corpora available on a machine are
described by so-called (plain text) registry files stored in a so-called
'registry directory' (referred to by argument
functionality to manage theses directories is used as auxiliary
functionality by higher-level functionality to download and install
cwb_corpus_dir(registry_dir, verbose = TRUE) cwb_registry_dir(verbose = TRUE) cwb_directories(registry_dir = NULL, corpus_dir = NULL, verbose = TRUE) create_cwb_directories(prefix = "~/cwb", ask = interactive(), verbose = TRUE) use_corpus_registry_envvar(registry_dir)
Path to the directory with registry files.
logical value, whether to output status messages.
Path to the directory with data directories for corpora.
The base path that will be prefixed
logical value, whether to prompt user before creating
cwb_corpus_dir will make a plausible suggestion for a corpus
directory where data directories for corpora reside. The procedure requires
that the registry directory (argument
registry_dir) is known. If
registry_dir is missing, the registry directory will be
guessed by calling
cwb_registry_dir. The heuristic to detect the
corpus directory is as follows: First, directories in the parent directory
of the registry directory that contain "corpus" or "corpora" are suggested.
If this does not yield a result, the data directories stated in the
registry files are evaluated. If there is one unique parent directory of
data directories (after removing temporary directories and directories
within packages), this unique directory is suggested.
will return a length-one
character vector with the path of the
suggested corpus directory, or
NULL if the heuristic does not yield
cwb_registry_dir will return return the system registry
directory. By default, the environment variable CORPUS_REGISTRY defines the
system registry directory. If the polmineR-package is loaded, a temporary
registry directory is used, replacing the system registry directory. In
cwb_registry_dir will retrieve the directory from the
option 'polmineR.corpus_registry'. The return value is a length-one
character vector or
NULL, if no registry directory can be detected.
cwb_directories will return a named character vector with the
registry directory and the corpus directory.
create_cwb_directories will create a 'registry' and an
'indexed_corpora' directory as subdirectories of the directory indicated by
ask indicates whether to create
directories, and whether user feedback is asked for before creating the
directories. The function returns a named character vector with the
registry and the corpus directory.
use_corpus_registry_envvar is a convenience function that
will assist users to define the environment variable CORPUS_REGSITRY in the
.Renviron-file. making it available across sessions. The function is
intended to be used in an interactive R session. An error is thrown if this
is not the case. The user will be prompted whether the cwbtools package
shall take care of creating / modifying the .Renviron-file. If not,
the file will be opened for manual modification with some instructions shown
in the terminal.