The partition class is used to manage subcorpora. It is an S4 class, and a set of methods is defined for the class. The class inherits from the classes count and textstat.

# S4 method for partition
p_attributes(.Object, p_attribute = NULL, decode = TRUE)

# S4 method for subcorpus
p_attributes(.Object, p_attribute = NULL, decode = TRUE)

is.partition(x)

# S4 method for partition
enrich(
  .Object,
  p_attribute = NULL,
  decode = TRUE,
  verbose = TRUE,
  mc = FALSE,
  ...
)

# S4 method for partition
as.regions(x)

# S4 method for partition
split(x, gap, ...)

Arguments

.Object

A partition object.

p_attribute

a p-attribute (for enriching) / performing count.

decode

logical value, whether to decode token ids into strings when performing count

x

A partition object.

verbose

logical value, whether to output messages

mc

logical or, if numeric, providing the number of cores

...

further parameters passed into count when calling enrich, and ...

gap

An integer value specifying the minimum gap between regions for performing the split.

Details

As partition objects inherit from count and textstat class, methods available are view to inspect the table in the stat slot, name and name<- to retrieve/set the name of an object, and more.

The is.partition function returns a logical value whether x is a partition, or not.

The enrich-method will add a count of tokens defined by argument p_attribute to slot stat of the partition object.

The split-method will split a partition object into a partition_bundle if gap between strucs exceeds a minimum number of tokens specified by gap. Relevant to split up a plenary protocol# into speeches. Note: To speed things up, the returned partitions will not include frequency lists. The lists can be prepared by applying enrich on the partition_bundle object that is returned.

Slots

name

A name to identify the object (character vector with length 1); useful when multiple partition objects are combined to a partition_bundle.

corpus

The CWB indexed corpus the partition is derived from (character vector with length 1).

encoding

Encoding of the corpus (character vector with length 1).

s_attributes

A named list with the s-attributes specifying the partition.

explanation

Object of class character, an explanation of the partition.

cpos

A matrix with left and right corpus positions defining regions (two columns).

annotations

Object of class list.

size

Total size of the partition (integer vector, length 1).

stat

An (optional) data.table with counts. If present, speeds up computation of cooccurrences, as count is already present.

metadata

Object of class data.frame, metadata information.

strucs

Object of class intger, the strucs defining the partition.

p_attribute

Object of class character indicating the p_attribute of the count in slot stat.

xml

Object of class character, whether the xml is flat or nested.

s_attribute_strucs

Object of class character the base node

key

Experimental, an s-attribute that is used as a key.

call

Object of class character the call that generated the partition

See also

The partition-class inherits from the textstat-class, see respective documentation to learn more.

Author

Andreas Blaette

Examples

p <- partition("GERMAPARLMINI", date = "2009-11-11", speaker = "Norbert Lammert")
#> ... get encoding: latin1
#> ... get cpos and strucs
name(p) <- "Norbert Lammert" pb <- split(p, gap = 500L) summary(pb)
#> name size #> 1 Norbert Lammert1 105 #> 2 Norbert Lammert2 8 #> 3 Norbert Lammert3 32 #> 4 Norbert Lammert4 22 #> 5 Norbert Lammert5 17 #> 6 Norbert Lammert6 77 #> 7 Norbert Lammert7 1273 #> 8 Norbert Lammert8 21 #> 9 Norbert Lammert9 23 #> 10 Norbert Lammert10 18 #> 11 Norbert Lammert11 32 #> 12 Norbert Lammert12 96