Class to manage subcorpora derived from a CWB corpus.
# S4 method for subcorpus summary(object) # S4 method for subcorpus name(x) <- value # S4 method for subcorpus get_corpus(x) # S4 method for subcorpus size(x, s_attribute = NULL, ...)
| object | A |
|---|---|
| x | A |
| value | A |
| s_attribute | A |
| ... | Arguments passed into |
summary: Get named list with basic information for
subcorpus object.
name<-: Assign name to a subcorpus object.
get_corpus: Get the corpus ID from the subcorpus object.
size: Get the size of a subcorpus object from the
respective slot of the object.
s_attributesA named list with the structural attributes
defining the subcorpus.
cposA matrix with left and right corpus positions defining
regions (two column matrix with integer values).
annotationsObject of class list.
sizeTotal size (number of tokens) of the subcorpus object (a
length-one integer vector). The value is accessible by calling
the size-method on the subcorpus-object (see examples).
metadataObject of class data.frame, metadata information.
strucsObject of class integer, the strucs defining the
subcorpus.
xmlObject of class character, whether the xml is "flat" or
"nested".
s_attribute_strucsObject of class character, the base node.
userIf the corpus on the server requires authentication, the username.
passwordIf the corpus on the server requires authentication, the password.
Most commonly, a subcorpus is derived from a corpus or
a subcorpus using the subset method. See
size for detailed documentation on how to use the
size-method. The subcorpus class shares many features with
the partition class, but it is more parsimonious and does not
include information on statistical properties of the subcorpus (i.e. a
count table). In line with this logic, the subcorpus class inherits
from the corpus class, whereas the partition class inherits
from the textstat class.
Other classes to manage corpora:
corpus-class,
phrases,
regions
#>#># basic example r <- corpus("REUTERS") k <- subset(r, grepl("kuwait", places)) name(k) <- "kuwait" y <- summary(k) s <- size(k) # the same with a magrittr pipe corpus("REUTERS") %>% subset(grepl("kuwait", places)) %>% summary()#> $name #> [1] "" #> #> $size #> [1] 660 #># subsetting a subcorpus in a pipe stone <- corpus("GERMAPARLMINI") %>% subset(date == "2009-11-10") %>% subset(speaker == "Frank-Walter Steinmeier") # perform count for subcorpus n <- corpus("REUTERS") %>% subset(grep("kuwait", places)) %>% count(p_attribute = "word") n <- corpus("REUTERS") %>% subset(grep("saudi-arabia", places)) %>% count('"Saudi" "Arabia"') # keyword-in-context analysis (kwic) k <- corpus("REUTERS") %>% subset(grep("kuwait", places)) %>% kwic("oil")#>#>#>#>#>