Get data.frame
with left and right corpus positions (cpos) for
structural attributes and values.
s_attribute_decode(
corpus,
data_dir,
s_attribute,
encoding = NULL,
registry = Sys.getenv("CORPUS_REGISTRY"),
method = c("R", "Rcpp")
)
A CWB corpus (ID in upper case).
The data directory where the binary files of the corpus are stored.
A structural attribute (length 1 character
vector).
Encoding of the values ("latin-1" or "utf-8")
The CWB registry directory.
A length-one character
vector, whether to use "R" or "Rcpp"
implementation for decoding structural attribute.
A data.frame
with three columns, if the s-attribute has
values, or two columns, if not. Column cpos_left
are the start
corpus positions of a structural annotation, cpos_right
the end
corpus positions. Column value
is the value of the annotation.
Two approaches are implemented: A pure R solution will decode the files
directly in the directory specified by data_dir
. An implementation
using Rcpp will use the registry file for corpus
to find the data
directory.
# pure R implementation (Rcpp implementation fails on Windows in vanilla mode)
b <- s_attribute_decode(
corpus = "REUTERS",
data_dir = system.file(package = "RcppCWB", "extdata", "cwb", "indexed_corpora", "reuters"),
registry = get_tmp_registry(),
s_attribute = "places", method = "R"
)
# Using Rcpp wrappers for CWB C code
b <- s_attribute_decode(
corpus = "REUTERS",
data_dir = system.file(package = "RcppCWB", "extdata", "cwb", "indexed_corpora", "reuters"),
s_attribute = "places",
method = "Rcpp",
registry = get_tmp_registry()
)