Wrappers for CWB Corpus Library functions suited for writing performance code.
s_attr(corpus, s_attribute, registry)
p_attr(corpus, p_attribute, registry)
p_attr_size(p_attr)
s_attr_size(s_attr)
p_attr_lexicon_size(p_attr)
cpos_to_struc(s_attr, cpos)
cpos_to_str(p_attr, cpos)
cpos_to_id(p_attr, cpos)
struc_to_cpos(s_attr, struc)
struc_to_str(s_attr, struc)
regex_to_id(p_attr, regex)
str_to_id(p_attr, str)
id_to_freq(p_attr, id)
id_to_cpos(p_attr, id)
cpos_to_lbound(s_attr, cpos)
cpos_to_rbound(s_attr, cpos)
ID of a CWB corpus (length-one character
vector).
A structural attribute (length-one character
vector).
Registry directory.
A positional attribute (length-one character
vector).
A externalptr
referencing a p-attribute.
A externalptr
referencing a p-attribute.
An integer
vector of corpus positions.
A length-one integer
vector with a struc.
A regular expression.
A character
vector.
An integer
vector with token ids.
The default cl_* R wrappers for the functions of the CWB Corpus Library
involve a lookup of a corpus and its p- or s-attributes (using the corpus ID,
registry and attribute indicated by length-one character vectors) every time
one of these functions is called. It is more efficient looking up an
attribute only once. This set of functions passes "externalptr" classes to
reference attributes that have been looked up. A relevant scenario is writing
functions with a C++ implementation that are compiled and linked using
Rcpp::cppFunction()
or Rcpp::sourceCpp()
library(Rcpp)
cppFunction(
'Rcpp::StringVector get_str(
SEXP corpus,
SEXP p_attribute,
SEXP registry,
Rcpp::IntegerVector cpos
){
SEXP attr;
Rcpp::StringVector result;
attr = RcppCWB::p_attr(corpus, p_attribute, registry);
result = RcppCWB::cpos_to_str(attr, cpos);
return(result);
}',
depends = "RcppCWB"
)
result <- get_str("REUTERS", "word", RcppCWB::get_tmp_registry(), 0:50)