R/S4classes.R
, R/textstat.R
, R/corpus.R
, and 2 more
textstat-class.Rd
The textstat
-class (technically an S4 class) serves as a superclass
for the classes features
, context
, and partition
.
Usually, the class will not be used directly. It offers a set of standard
generic methods (such as head
, tail
, dim
, nrow
,
colnames
) its childs inherit. The core feature of textstat
and
its childs is a data.table
in the slot stat
for keeping data on
text statistics of a corpus, or a partition
.
# S4 method for textstat name(x) # S4 method for character name(x) # S4 method for textstat name(x) <- value # S4 method for textstat round(x, digits = 2L) # S4 method for textstat sort(x, by, decreasing = TRUE) as.bundle(object, ...) # S4 method for textstat,textstat +(e1, e2) # S4 method for textstat subset(x, subset) # S3 method for textstat as.data.table(x, ...) # S4 method for textstat show(object) # S4 method for textstat p_attributes(.Object) # S4 method for textstat knit_print(x, options = knitr::opts_chunk, ...) # S4 method for textstat get_corpus(x) # S4 method for textstat format(x, digits = 2L) # S4 method for textstat view(.Object)
x | A |
---|---|
value | A |
digits | Number of digits. |
by | Column that will serve as the key for sorting. |
decreasing | Logical, whether to return decreasing order. |
object | a textstat object |
... | Argument that will be passed into a call of the |
e1 | A |
e2 | Another |
subset | A logical expression indicating elements or rows to keep. |
.Object | A |
options | Chunk options. |
A head
-method will return the first rows of the data.table
in
the stat
-slot. Use argument n
to specify the number of rows.
A tail
-method will return the last rows of the data.table
in
the stat
-slot. Use argument n
to specify the number of rows.
The methods dim
, nrow
and ncol
will return information
on the dimensions, the number of rows, or the number of columns of the
data.table
in the stat
-slot, respectively.
Objects derived from the textstat
class can be indexed with simple
square brackets ("[") to get rows specified by an numeric/integer vector,
and with double square brackets ("[[") to get specific columns from the
data.table
in the slot stat
.
The colnames
-method will return the column names of the data-table
in the slot stat
.
The methods as.data.table
, and as.data.frame
will extract the
data.table
in the slot stat
as a data.table
, or
data.frame
, respectively.
textstat
objects can have a name, which can be retrieved, and set using
the name
-method and name<-
, respectively.
The round()
-method looks up all numeric columns in the
data.table
in the stat
-slot of the textstat
object and
rounds values of these columns to the number of decimal places specified by
argument digits
.
The knit_print
method will be called by knitr to render
`textstat` objects or objects inheriting from the `textstat` class as a
DataTable htmlwidget
when rendering a R Markdown document as html.
It will usually be necessary to explicitly state "render = knit_print" in
the chunk options. The option `polmineR.pagelength` controls the number of
lines displayed in the resulting `htmlwidget`. Note that including
htmlwidgets in html documents requires that pandoc is installed. To avoid
an error, a formatted data.table
is returned by knit_print
if
pandoc is not available.
The format()
-method returns a pretty-printed and minimized
version of the data.table
in the stat
-slot of the
textstat
-object: It will round all numeric columns to the number of decimal
numbers specified by digits
, and drop all columns with token ids. The
return value is a data.table
.
p_attribute
Object of class character
, p-attribute of the query.
corpus
A corpus specified by a length-one character
vector.
stat
A data.table
with statistical information.
name
The name of the object.
annotation_cols
A character
vector, column names of
data.table
in slot stat
that are annotations.
encoding
A length-one character
vector, the encoding of the corpus.
#>#>#>#>#>y <- cooccurrences(P, query = "Arbeit") # generics defined in the polmineR package x <- count("REUTERS", p_attribute = "word") name(x) <- "count_reuters" name(x)#> [1] "count_reuters"#> [1] "REUTERS"# Standard generic methods known from data.frames work for objects inheriting # from the textstat class head(y)#> word word_id count_partition count_coi count_ref exp_coi #> 1: Bundesagentur 5360 15 11 4 0.10741211 #> 2: für 66 1625 55 1570 11.63631205 #> 3: Sozial 12775 9 7 2 0.06444727 #> 4: " 493 468 21 447 3.35125787 #> 5: ihre 597 121 12 109 0.86645770 #> 6: schafft 675 36 8 28 0.25778907 #> exp_ref ll rank_ll #> 1: 14.892588 91.39629 1 #> 2: 1613.363688 86.51204 2 #> 3: 8.935553 59.67241 3 #> 4: 464.648742 42.65782 4 #> 5: 120.133542 41.95489 5 #> 6: 35.742211 41.32776 6#> word word_id count_partition count_coi count_ref #> 1: reguläre 2987 1 1 0 #> 2: sozialversicherungspflichtigen 13356 1 1 0 #> 3: tete 1440 1 1 0 #> 4: unsichere 12856 1 3 0 #> 5: verhelfen 13422 1 1 0 #> 6: Überweisungsvorschlag 2473 7 7 0 #> exp_coi exp_ref ll rank_ll #> 1: 0.007160807 0.9928392 NA 518 #> 2: 0.007160807 0.9928392 NA 519 #> 3: 0.007160807 0.9928392 NA 520 #> 4: 0.021482422 2.9785176 NA 521 #> 5: 0.007160807 0.9928392 NA 522 #> 6: 0.050125652 6.9498743 NA 523#> [1] 523#> [1] 9#> [1] 523 9#> [1] "word" "word_id" "count_partition" "count_coi" #> [5] "count_ref" "exp_coi" "exp_ref" "ll" #> [9] "rank_ll"# Use brackets for indexing if (FALSE) { y[1:25] y[,c("word", "ll")] y[1:25, "word"] y[1:25][["word"]] y[which(y[["word"]] %in% c("Arbeit", "Sozial"))] y[ y[["word"]] %in% c("Arbeit", "Sozial") ] } sc <- partition("GERMAPARLMINI", speaker = "Angela Dorothea Merkel")#>#>cnt <- count(sc, p_attribute = c("word", "pos")) cnt_min <- subset(cnt, pos %in% c("NN", "ADJA")) cnt_min <- subset(cnt, pos == "NE") # Get statistics in textstat object as data.table count_dt <- corpus("REUTERS") %>% subset(grep("saudi-arabia", places)) %>% count(p_attribute = "word") %>% as.data.table()