R/cooccurrences.R
, R/as.sparseMatrix.R
, R/enrich.R
all-cooccurrences-class.Rd
The Cooccurrences
-class stores the information for all cooccurrences
in a corpus. As this data can be bulky, in-place modifications of the
data.table
in the stat-slot of a Cooccurrences
-object are used
wherever possible, to avoid copying potentially large objects whenever
possible. The class inherits from the textstat
-class, so that methods
for textstat
-objects are inherited (see examples).
# S4 method for Cooccurrences as.simple_triplet_matrix(x) # S4 method for Cooccurrences as_igraph( x, edge_attributes = c("ll", "ab_count", "rank_ll"), vertex_attributes = "count", as.undirected = TRUE, drop = getOption("polmineR.villainChars") ) # S4 method for Cooccurrences subset(x, ..., by) # S4 method for Cooccurrences decode(.Object) # S4 method for Cooccurrences kwic( .Object, left = getOption("polmineR.left"), right = getOption("polmineR.right"), verbose = TRUE, progress = TRUE ) # S4 method for Cooccurrences as.sparseMatrix(x, col = "ab_count", ...) # S4 method for Cooccurrences enrich(.Object)
x | A |
---|---|
edge_attributes | Attributes from stat |
vertex_attributes | Vertex attributes to add to nodes. |
as.undirected | Logical, whether to return directed or undirected graph. |
drop | A character vector indicating names of nodes to drop from
|
... | Further arguments passed into a further call of |
by | A |
.Object | A |
left | Number of tokens to the left of the node. |
right | Number of tokens to the right of the node. |
verbose | Logical. |
progress | Logical, whether to show progress bar. |
col | A column to extract. |
The as.simple_triplet_matrix
-method will transform a
Cooccurrences
object into a sparse matrix. For reasons of memory
efficiency, decoding token ids is performed within the method at the
as late as possible. It is NOT necessary that decoded tokens are present
in the table in the Cooccurrences
object.
The as_igraph
-method can be used to turn an object of the Cooccurrences
-class
into an igraph
-object.
The subset
method, as a particular feature, allows a
Coocccurrences
-object to be subsetted by a featurs
-Object
resulting from a features extraction that compares two Cooccurrences
objects.
For reasons of memory efficiency, the initial data.table
in
the slot stat
of a Cooccurrences
-object will identify tokens by an
integer id, not by the string of the token. The decode()
-method will
replace these integer columns with human-readable character vectors. Due to
the reference logic of the data.table
object, this is an in-place
operation, peformed without copying the table. The modified object is
returned invisibly; usually it will not be necessary to catch the return
value.
The kwic
-method will add a column to the data.table
in
the stat
-slot with the concordances that are behind a statistical
finding, and to the data.table
in the stat
-slot of the
partition
in the slot partition
. It is an in-place operation.
Returns a sparseMatrix
based on the counts of term cooccurrences. At this stage,
it is required that decoded tokens are present.
The enrich
-method will add columns 'a_count' and 'b_count' to the
data.table
in the 'stat' slot of the Cooccurrences
object. If the
count for the subcorpus/partition from which the cooccurrences are derived is
not yet present, the count is performed first.
left
Single integer
value, number of tokens to the left of the node.
right
Single integer
value, number of tokens to the right of the node.
p_attribute
A character
vector, the p-attribute(s) the evaluation of the corpus is based on.
corpus
Length-one character
vector, the CWB corpus used.
stat
A data.table
with the statistical analysis of cooccurrences.
encoding
Length-one character
vector, the encoding of the corpus.
partition
The partition
that is the basis for computations.
window_sizes
A data.table
linking the number of tokens in the
context of a token identified by id.
minimized
Logical, whether the object has been minimized.
See the documentation of the Cooccurrences
-method
(including examples) for procedures to get and filter cooccurrence
information. See the documentation for the textstat-class
explaining which methods for this superclass of the
Cooccurrences
-class which are available.
X <- Cooccurrences("REUTERS", p_attribute = "word", left = 2L, right = 2L) m <- as.simple_triplet_matrix(X) if (FALSE) { X <- Cooccurrences("REUTERS", p_attribute = "word", left = 5L, right = 5L) decode(X) sm <- as.sparseMatrix(X) stm <- as.simple_triplet_matrix(X) }