S4 textstat superclass. — textstat-class • polmineR

The textstat-class (technically an S4 class) serves as a superclass for the classes features, context, and partition. Usually, the class will not be used directly. It offers a set of standard generic methods (such as head, tail, dim, nrow, colnames) its childs inherit. The core feature of textstat and its childs is a data.table in the slot stat for keeping data on text statistics of a corpus, or a partition.

# S4 method for textstat
name(x)

# S4 method for character
name(x)

# S4 method for textstat
name(x) <- value

# S4 method for textstat
round(x, digits = 2L)

# S4 method for textstat
sort(x, by, decreasing = TRUE)

as.bundle(object, ...)

# S4 method for textstat,textstat
+(e1, e2)

# S4 method for textstat
subset(x, subset)

# S3 method for textstat
as.data.table(x, ...)

# S4 method for textstat
show(object)

# S4 method for textstat
p_attributes(.Object)

# S4 method for textstat
knit_print(x, options = knitr::opts_chunk, ...)

# S4 method for textstat
get_corpus(x)

# S4 method for textstat
format(x, digits = 2L)

# S4 method for textstat
view(.Object)

Arguments

x	A `textstat` object.
value	A `character` vector to assign as name to slot `name` of a `textstat` class object.
digits	Number of digits.
by	Column that will serve as the key for sorting.
decreasing	Logical, whether to return decreasing order.
object	a textstat object
...	Argument that will be passed into a call of the `format` method on the object `x`.
e1	A `texstat` object.
e2	Another `texstat` object.
subset	A logical expression indicating elements or rows to keep.
.Object	A `textstat` object.
options	Chunk options.

Details

A head-method will return the first rows of the data.table in the stat-slot. Use argument n to specify the number of rows.

A tail-method will return the last rows of the data.table in the stat-slot. Use argument n to specify the number of rows.

The methods dim, nrow and ncol will return information on the dimensions, the number of rows, or the number of columns of the data.table in the stat-slot, respectively.

Objects derived from the textstat class can be indexed with simple square brackets ("[") to get rows specified by an numeric/integer vector, and with double square brackets ("[[") to get specific columns from the data.table in the slot stat.

The colnames-method will return the column names of the data-table in the slot stat.

The methods as.data.table, and as.data.frame will extract the data.table in the slot stat as a data.table, or data.frame, respectively.

textstat objects can have a name, which can be retrieved, and set using the name-method and name<-, respectively.

The round()-method looks up all numeric columns in the data.table in the stat-slot of the textstat object and rounds values of these columns to the number of decimal places specified by argument digits.

The knit_print method will be called by knitr to render `textstat` objects or objects inheriting from the `textstat` class as a DataTable htmlwidget when rendering a R Markdown document as html. It will usually be necessary to explicitly state "render = knit_print" in the chunk options. The option `polmineR.pagelength` controls the number of lines displayed in the resulting `htmlwidget`. Note that including htmlwidgets in html documents requires that pandoc is installed. To avoid an error, a formatted data.table is returned by knit_print if pandoc is not available.

The format()-method returns a pretty-printed and minimized version of the data.table in the stat-slot of the textstat-object: It will round all numeric columns to the number of decimal numbers specified by digits, and drop all columns with token ids. The return value is a data.table.

Slots

p_attribute: Object of class character, p-attribute of the query.
corpus: A corpus specified by a length-one character vector.
stat: A data.table with statistical information.
name: The name of the object.
annotation_cols: A character vector, column names of data.table in slot stat that are annotations.
encoding: A length-one character vector, the encoding of the corpus.

Examples

use("polmineR")
#> ... activating corpus: GERMAPARLMINI (version: 0.0.1 | build date: 2019-02-23)
#> ... activating corpus: REUTERS
P <- partition("GERMAPARLMINI", date = ".*", p_attribute = "word", regex = TRUE)
#> ... get encoding: latin1
#> ... get cpos and strucs
#> ... getting counts for p-attribute(s): word
y <- cooccurrences(P, query = "Arbeit")

# generics defined in the polmineR package
x <- count("REUTERS", p_attribute = "word")
name(x) <- "count_reuters"
name(x)
#> [1] "count_reuters"
get_corpus(x)
#> [1] "REUTERS"

# Standard generic methods known from data.frames work for objects inheriting
# from the textstat class

head(y)
#>             word word_id count_partition count_coi count_ref     exp_coi
#> 1: Bundesagentur    5360              15        11         4  0.10741211
#> 2:           für      66            1625        55      1570 11.63631205
#> 3:        Sozial   12775               9         7         2  0.06444727
#> 4:             "     493             468        21       447  3.35125787
#> 5:          ihre     597             121        12       109  0.86645770
#> 6:       schafft     675              36         8        28  0.25778907
#>        exp_ref       ll rank_ll
#> 1:   14.892588 91.39629       1
#> 2: 1613.363688 86.51204       2
#> 3:    8.935553 59.67241       3
#> 4:  464.648742 42.65782       4
#> 5:  120.133542 41.95489       5
#> 6:   35.742211 41.32776       6
tail(y)
#>                              word word_id count_partition count_coi count_ref
#> 1:                       reguläre    2987               1         1         0
#> 2: sozialversicherungspflichtigen   13356               1         1         0
#> 3:                           tete    1440               1         1         0
#> 4:                      unsichere   12856               1         3         0
#> 5:                      verhelfen   13422               1         1         0
#> 6:          Überweisungsvorschlag    2473               7         7         0
#>        exp_coi   exp_ref ll rank_ll
#> 1: 0.007160807 0.9928392 NA     518
#> 2: 0.007160807 0.9928392 NA     519
#> 3: 0.007160807 0.9928392 NA     520
#> 4: 0.021482422 2.9785176 NA     521
#> 5: 0.007160807 0.9928392 NA     522
#> 6: 0.050125652 6.9498743 NA     523
nrow(y)
#> [1] 523
ncol(y)
#> [1] 9
dim(y)
#> [1] 523   9
colnames(y)
#> [1] "word"            "word_id"         "count_partition" "count_coi"      
#> [5] "count_ref"       "exp_coi"         "exp_ref"         "ll"             
#> [9] "rank_ll"        

# Use brackets for indexing 

if (FALSE) {
y[1:25]
y[,c("word", "ll")]
y[1:25, "word"]
y[1:25][["word"]]
y[which(y[["word"]] %in% c("Arbeit", "Sozial"))]
y[ y[["word"]] %in% c("Arbeit", "Sozial") ]
}
sc <- partition("GERMAPARLMINI", speaker = "Angela Dorothea Merkel")
#> ... get encoding: latin1
#> ... get cpos and strucs
cnt <- count(sc, p_attribute = c("word", "pos"))
cnt_min <- subset(cnt, pos %in% c("NN", "ADJA"))
cnt_min <- subset(cnt, pos == "NE")

# Get statistics in textstat object as data.table
count_dt <- corpus("REUTERS") %>%
  subset(grep("saudi-arabia", places)) %>% 
  count(p_attribute = "word") %>%
  as.data.table()