Wrappers for CWB Corpus Library functions suited for writing performance code.

s_attr(corpus, s_attribute, registry)

p_attr(corpus, p_attribute, registry)

p_attr_size(p_attr)

s_attr_size(s_attr)

p_attr_lexicon_size(p_attr)

cpos_to_struc(s_attr, cpos)

cpos_to_str(p_attr, cpos)

cpos_to_id(p_attr, cpos)

struc_to_cpos(s_attr, struc)

struc_to_str(s_attr, struc)

regex_to_id(p_attr, regex)

str_to_id(p_attr, str)

id_to_freq(p_attr, id)

id_to_cpos(p_attr, id)

cpos_to_lbound(s_attr, cpos)

cpos_to_rbound(s_attr, cpos)

Arguments

corpus

ID of a CWB corpus (length-one character vector).

s_attribute

A structural attribute (length-one character vector).

registry

Registry directory.

p_attribute

A positional attribute (length-one character vector).

p_attr

A externalptr referencing a p-attribute.

s_attr

A externalptr referencing a p-attribute.

cpos

An integer vector of corpus positions.

struc

A length-one integer vector with a struc.

regex

A regular expression.

str

A character vector.

id

An integer vector with token ids.

Details

The default cl_* R wrappers for the functions of the CWB Corpus Library involve a lookup of a corpus and its p- or s-attributes (using the corpus ID, registry and attribute indicated by length-one character vectors) every time one of these functions is called. It is more efficient looking up an attribute only once. This set of functions passes "externalptr" classes to reference attributes that have been looked up. A relevant scenario is writing functions with a C++ implementation that are compiled and linked using Rcpp::cppFunction() or Rcpp::sourceCpp()

Examples

library(Rcpp)

cppFunction(
  'Rcpp::StringVector get_str(
     SEXP corpus,
     SEXP p_attribute,
     SEXP registry,
     Rcpp::IntegerVector cpos
   ){
     SEXP attr;
     Rcpp::StringVector result;
     attr = RcppCWB::p_attr(corpus, p_attribute, registry);
     result = RcppCWB::cpos_to_str(attr, cpos);
     return(result);
  }',
  depends = "RcppCWB"
)

result <- get_str("REUTERS", "word", RcppCWB::get_tmp_registry(), 0:50)