A set of functions to parse, create and write registry files.
registry_file_parse(corpus, registry_dir = Sys.getenv("CORPUS_REGISTRY"))
registry_file_compose(x)
registry_data(
name,
id,
home,
info = fs::path(home, ".info"),
properties = c(charset = "utf-8"),
p_attributes,
s_attributes = character()
)
registry_file_write(
data,
corpus,
registry_dir = Sys.getenv("CORPUS_REGISTRY"),
...
)
registry_set_property(data, property, value)
registry_set_info(data, info_file)
registry_set_name(data, name)
A CWB corpus indicated by a length-one character
vector.
Directory with registry files.
An object of class registry_data
.
Long descriptive name of the corpus.
Short name of corpus (character
vector).
Path with data directory for indexed corpus.
A character
vector containing path name of info file.
Named character
vector with corpus properties,
should at least include 'charset'.
A character
vector with positional attributes to
declare.
A character
vector with structural attributes to
declare.
A registry_data
object.
further parameters
A single corpus property (character
vector).
Value of a corpus property (character
vector).
Path to the info file providing information on the corpus.
registry_file_parse()
will return an object of class registry_data
.
See the appendix to the 'Corpus Encoding Tutorial' (https://cwb.sourceforge.io/files/CWB_Encoding_Tutorial.pdf), which includes an explanation of the registry file format.
registry_file_compose
will turn an
registry_data
-object into a character vector with a registry file
that can be written to disk.
registry_file_write()
will compose a registry file from
data
and write it to disk.
registry_set_property()
will set a single corpus property.
registry_set_info()
will set the path to the info file.
registry_set_name()
sets the long descriptive name of the corpus.
regdata <- registry_file_parse(
corpus = "REUTERS",
registry_dir = system.file(package = "RcppCWB", "extdata", "cwb", "registry")
)