Get all regions defined by a structural attribute. Unlike get_region_matrix() that returns a region matrix for a defined subset of strucs, all regions are returned. As it is the fastest option, the function reads the binary *.rng file for the structural attribute directly. The corpus library (CL) is not used in this case.

s_attr_regions(
  corpus,
  s_attr,
  registry = Sys.getenv("CORPUS_REGISTRY"),
  data_dir = corpus_data_dir(corpus = corpus, registry = registry)
)

Arguments

corpus

A length-one character vector with a corpus ID.

s_attr

A length-one character vector stating a structural attribute.

registry

A length-one character vector stating the registry directory (defaults to CORPUS_REGISTRY environment variable).

data_dir

The data directory of the corpus.

Value

A two-colum matrix with the regions defined by the structural attribute: Column 1 defines left corpus positions and column 2 right corpus positions of regions.

Examples

s_attr_regions("REUTERS", s_attr = "id", registry = get_tmp_registry())
#>       [,1] [,2]
#>  [1,]    0   91
#>  [2,]   92  535
#>  [3,]  536  590
#>  [4,]  591  659
#>  [5,]  660  752
#>  [6,]  753 1217
#>  [7,] 1218 1651
#>  [8,] 1652 1815
#>  [9,] 1816 2146
#> [10,] 2147 2495
#> [11,] 2496 2873
#> [12,] 2874 2965
#> [13,] 2966 3070
#> [14,] 3071 3173
#> [15,] 3174 3283
#> [16,] 3284 3432
#> [17,] 3433 3631
#> [18,] 3632 3714
#> [19,] 3715 3996
#> [20,] 3997 4049