A set of functions to parse, create and write registry files.

registry_file_parse(corpus, registry_dir = Sys.getenv("CORPUS_REGISTRY"))

registry_file_compose(x)

registry_data(
  name,
  id,
  home,
  info = fs::path(home, ".info"),
  properties = c(charset = "utf-8"),
  p_attributes,
  s_attributes = character()
)

registry_file_write(
  data,
  corpus,
  registry_dir = Sys.getenv("CORPUS_REGISTRY"),
  ...
)

registry_set_property(data, property, value)

registry_set_info(data, info_file)

registry_set_name(data, name)

Arguments

corpus

A CWB corpus indicated by a length-one character vector.

registry_dir

Directory with registry files.

x

An object of class registry_data.

name

Long descriptive name of the corpus.

id

Short name of corpus (character vector).

home

Path with data directory for indexed corpus.

info

A character vector containing path name of info file.

properties

Named character vector with corpus properties, should at least include 'charset'.

p_attributes

A character vector with positional attributes to declare.

s_attributes

A character vector with structural attributes to declare.

data

A registry_data object.

...

further parameters

property

A single corpus property (character vector).

value

Value of a corpus property (character vector).

info_file

Path to the info file providing information on the corpus.

Details

registry_file_parse() will return an object of class registry_data.

See the appendix to the 'Corpus Encoding Tutorial' (https://cwb.sourceforge.io/files/CWB_Encoding_Tutorial.pdf), which includes an explanation of the registry file format.

registry_file_compose will turn an registry_data-object into a character vector with a registry file that can be written to disk.

registry_file_write() will compose a registry file from data and write it to disk.

registry_set_property() will set a single corpus property.

registry_set_info() will set the path to the info file.

registry_set_name() sets the long descriptive name of the corpus.

Examples

regdata <- registry_file_parse(
  corpus = "REUTERS",
  registry_dir = system.file(package = "RcppCWB", "extdata", "cwb", "registry")
  )