Class to manage subcorpora derived from a CWB corpus.

# S4 method for subcorpus
summary(object)

# S4 method for subcorpus
name(x) <- value

# S4 method for subcorpus
get_corpus(x)

# S4 method for subcorpus
size(x, s_attribute = NULL, ...)

Arguments

object

A subcorpus object.

x

A subcorpus object.

value

A character vector to assign as name to slot name of a subcorpus class object.

s_attribute

A character vector with s-attributes (one or more).

...

Arguments passed into size-method. Used only to maintain backwards compatibility.

Methods (by generic)

  • summary: Get named list with basic information for subcorpus object.

  • name<-: Assign name to a subcorpus object.

  • get_corpus: Get the corpus ID from the subcorpus object.

  • size: Get the size of a subcorpus object from the respective slot of the object.

Slots

s_attributes

A named list with the structural attributes defining the subcorpus.

cpos

A matrix with left and right corpus positions defining regions (two column matrix with integer values).

annotations

Object of class list.

size

Total size (number of tokens) of the subcorpus object (a length-one integer vector). The value is accessible by calling the size-method on the subcorpus-object (see examples).

metadata

Object of class data.frame, metadata information.

strucs

Object of class integer, the strucs defining the subcorpus.

xml

Object of class character, whether the xml is "flat" or "nested".

s_attribute_strucs

Object of class character, the base node.

user

If the corpus on the server requires authentication, the username.

password

If the corpus on the server requires authentication, the password.

See also

Most commonly, a subcorpus is derived from a corpus or a subcorpus using the subset method. See size for detailed documentation on how to use the size-method. The subcorpus class shares many features with the partition class, but it is more parsimonious and does not include information on statistical properties of the subcorpus (i.e. a count table). In line with this logic, the subcorpus class inherits from the corpus class, whereas the partition class inherits from the textstat class.

Other classes to manage corpora: corpus-class, phrases, regions

Examples

use("polmineR")
#> ... activating corpus: GERMAPARLMINI (version: 0.0.1 | build date: 2019-02-23)
#> ... activating corpus: REUTERS
# basic example r <- corpus("REUTERS") k <- subset(r, grepl("kuwait", places)) name(k) <- "kuwait" y <- summary(k) s <- size(k) # the same with a magrittr pipe corpus("REUTERS") %>% subset(grepl("kuwait", places)) %>% summary()
#> $name #> [1] "" #> #> $size #> [1] 660 #>
# subsetting a subcorpus in a pipe stone <- corpus("GERMAPARLMINI") %>% subset(date == "2009-11-10") %>% subset(speaker == "Frank-Walter Steinmeier") # perform count for subcorpus n <- corpus("REUTERS") %>% subset(grep("kuwait", places)) %>% count(p_attribute = "word") n <- corpus("REUTERS") %>% subset(grep("saudi-arabia", places)) %>% count('"Saudi" "Arabia"') # keyword-in-context analysis (kwic) k <- corpus("REUTERS") %>% subset(grep("kuwait", places)) %>% kwic("oil")
#> ... getting corpus positions
#> ... number of hits: 14
#> ... checking that all p-attributes are available
#> ... getting token id for p-attribute: word
#> ... generating contexts