Class to manage subcorpora derived from a CWB corpus.
# S4 method for subcorpus summary(object) # S4 method for subcorpus name(x) <- value # S4 method for subcorpus get_corpus(x) # S4 method for subcorpus size(x, s_attribute = NULL, ...)
object | A |
---|---|
x | A |
value | A |
s_attribute | A |
... | Arguments passed into |
summary
: Get named list with basic information for
subcorpus
object.
name<-
: Assign name to a subcorpus
object.
get_corpus
: Get the corpus ID from the subcorpus
object.
size
: Get the size of a subcorpus
object from the
respective slot of the object.
s_attributes
A named list
with the structural attributes
defining the subcorpus.
cpos
A matrix
with left and right corpus positions defining
regions (two column matrix with integer
values).
annotations
Object of class list
.
size
Total size (number of tokens) of the subcorpus
object (a
length-one integer
vector). The value is accessible by calling
the size
-method on the subcorpus
-object (see examples).
metadata
Object of class data.frame
, metadata information.
strucs
Object of class integer
, the strucs defining the
subcorpus.
xml
Object of class character
, whether the xml is "flat" or
"nested".
s_attribute_strucs
Object of class character
, the base node.
user
If the corpus on the server requires authentication, the username.
password
If the corpus on the server requires authentication, the password.
Most commonly, a subcorpus
is derived from a corpus
or
a subcorpus
using the subset
method. See
size
for detailed documentation on how to use the
size
-method. The subcorpus
class shares many features with
the partition
class, but it is more parsimonious and does not
include information on statistical properties of the subcorpus (i.e. a
count table). In line with this logic, the subcorpus
class inherits
from the corpus
class, whereas the partition
class inherits
from the textstat
class.
Other classes to manage corpora:
corpus-class
,
phrases
,
regions
#>#># basic example r <- corpus("REUTERS") k <- subset(r, grepl("kuwait", places)) name(k) <- "kuwait" y <- summary(k) s <- size(k) # the same with a magrittr pipe corpus("REUTERS") %>% subset(grepl("kuwait", places)) %>% summary()#> $name #> [1] "" #> #> $size #> [1] 660 #># subsetting a subcorpus in a pipe stone <- corpus("GERMAPARLMINI") %>% subset(date == "2009-11-10") %>% subset(speaker == "Frank-Walter Steinmeier") # perform count for subcorpus n <- corpus("REUTERS") %>% subset(grep("kuwait", places)) %>% count(p_attribute = "word") n <- corpus("REUTERS") %>% subset(grep("saudi-arabia", places)) %>% count('"Saudi" "Arabia"') # keyword-in-context analysis (kwic) k <- corpus("REUTERS") %>% subset(grep("kuwait", places)) %>% kwic("oil")#>#>#>#>#>