The data format of the Corpus Workbench (CWB) allows nested XML as import data. Auxiliary functions assist detecting whether two structural attributes are nested or at the same level (i.e. defining the same regions).
s_attr_is_descendent(
x,
y,
corpus,
registry = Sys.getenv("CORPUS_REGISTRY"),
sample = NULL
)
s_attr_is_sibling(x, y, corpus, registry = Sys.getenv("CORPUS_REGISTRY"))
s_attr_relationship(x, y, corpus, registry = Sys.getenv("CORPUS_REGISTRY"))
A structural attribute, stated as length-one character
vector.
Another structural attribute, stated as length-one character
vector.
A corpus ID (length-one character
vector).
The directory with the registry file for the corpus.
An integer
vector with a sample number of strucs to evaluate.
Evaluating only a sample may be an efficient choice for large corpora. If NULL
(default), all strucs are evaluated.
s_attr_is_descendent()
will evaluate whether s_attribute x
is
a child of s_attribute y
. The return value is TRUE
(a single logical
value) if all regions defined by x
are within the regions defined by y
.
If not, FALSE
is returned. The return values is also FALSE
if all regions
of x
and y
are idential. Attributes will be siblings in this case,
and not in an ancestor-sibling relationship.
s_attr_is_sibling()
will test whether the regions defined for
structural attribute x
and structural attribute y
are identical. If
yes, TRUE
is returned, assuming that both attributes are at the same
level (siblings). If not, FALSE
is returned.
s_attr_relationship()
will return 0
if s-attributes x
and y
are siblings in the sense that they define identical regions. The return
value is 0
if x
is an ancestor of y
and 1
if x
is a descencdent
of y
.
s_attr_is_descendent("id", "places", corpus = "REUTERS", registry = get_tmp_registry())
#> [1] FALSE
s_attr_is_sibling(x = "id", y = "places", corpus = "REUTERS", registry = get_tmp_registry())
#> [1] TRUE
s_attr_is_sibling(x = "id", y = "places", corpus = "REUTERS", registry = get_tmp_registry())
#> [1] TRUE