The S4 subcorpus class.

Class to manage subcorpora derived from a CWB corpus.

# S4 method for subcorpus
summary(object)

# S4 method for subcorpus
name(x) <- value

# S4 method for subcorpus
get_corpus(x)

# S4 method for subcorpus
size(x, s_attribute = NULL, ...)

Arguments

object	A `subcorpus` object.
x	A `subcorpus` object.
value	A `character` vector to assign as name to slot `name` of a `subcorpus` class object.
s_attribute	A `character` vector with s-attributes (one or more).
...	Arguments passed into `size`-method. Used only to maintain backwards compatibility.

Methods (by generic)

summary: Get named list with basic information for subcorpus object.
name<-: Assign name to a subcorpus object.
get_corpus: Get the corpus ID from the subcorpus object.
size: Get the size of a subcorpus object from the respective slot of the object.

Slots

s_attributes: A named list with the structural attributes defining the subcorpus.
cpos: A matrix with left and right corpus positions defining regions (two column matrix with integer values).
annotations: Object of class list.
size: Total size (number of tokens) of the subcorpus object (a length-one integer vector). The value is accessible by calling the size-method on the subcorpus-object (see examples).
metadata: Object of class data.frame, metadata information.
strucs: Object of class integer, the strucs defining the subcorpus.
xml: Object of class character, whether the xml is "flat" or "nested".
s_attribute_strucs: Object of class character, the base node.
user: If the corpus on the server requires authentication, the username.
password: If the corpus on the server requires authentication, the password.

Examples

use("polmineR")
#> ... activating corpus: GERMAPARLMINI (version: 0.0.1 | build date: 2019-02-23)
#> ... activating corpus: REUTERS

# basic example 
r <- corpus("REUTERS")
k <- subset(r, grepl("kuwait", places))
name(k) <- "kuwait"
y <- summary(k)
s <- size(k)

# the same with a magrittr pipe
corpus("REUTERS") %>%
  subset(grepl("kuwait", places)) %>%
  summary()
#> $name
#> [1] ""
#> 
#> $size
#> [1] 660
#> 
  
# subsetting a subcorpus in a pipe
stone <- corpus("GERMAPARLMINI") %>%
  subset(date == "2009-11-10") %>%
  subset(speaker == "Frank-Walter Steinmeier")

# perform count for subcorpus
n <- corpus("REUTERS") %>% subset(grep("kuwait", places)) %>% count(p_attribute = "word")
n <- corpus("REUTERS") %>% subset(grep("saudi-arabia", places)) %>% count('"Saudi" "Arabia"')
  
# keyword-in-context analysis (kwic)   
k <- corpus("REUTERS") %>% subset(grep("kuwait", places)) %>% kwic("oil")
#> ... getting corpus positions
#> ... number of hits: 14
#> ... checking that all p-attributes are available
#> ... getting token id for p-attribute: word
#> ... generating contexts

Arguments

Methods (by generic)

Slots

See also

Examples