Package Basics

Getting started with polmineR (CLI And GUI usage).

polmineR()

polmineR-package

get_info() show_info()

Generic methods defined in the polmineR package

Corpora and subcorpora

Corpora, subcorpora/partitions, and partition_bundle objects are the point of departure for any analysis using polmineR. Constructors introduced in the next section require an understanding of basic methods and classes to generate and manage (sub-)corpora

use()

Add corpora in R data packages to session registry.

corpus(<character>) corpus(<missing>)

Corpus class initialization

name(<corpus>) get_corpus(<corpus>) show(<corpus>) `$`(<corpus>) get_info(<corpus>) show_info(<corpus>)

Corpus class methods

subset(<corpus>) subset(<character>) subset(<subcorpus>) subset(<remote_corpus>)

Subsetting corpora and subcorpora

ocpu_exec()

Execute code on OpenCPU server

aggregate(<slice>)

Virtual class slice.

partition()

Initialize a partition.

partition_bundle()

Generate bundle of partitions.

p_attributes()

Get p-attributes.

s_attributes()

Get s-attributes.

as.speeches()

Split corpus or partition into speeches.

summary(<subcorpus>) `name<-`(<subcorpus>) get_corpus(<subcorpus>) size(<subcorpus>)

The S4 subcorpus class.

show(<subcorpus_bundle>) merge(<subcorpus_bundle>) merge(<subcorpus>) split(<subcorpus>) split(<corpus>) split(<subcorpus_bundle>)

Bundled subcorpora

as.regions() as.data.table(<regions>)

Regions of a CWB corpus.

Methods I: Basic Vocabulary

Data analysis using polmineR will usually start by calling a constructor method resulting in an S4 class object with the name of the calling method. All methods basic vocabulary can be applied to corpora (defined by a character vector), to partitions, and to partition_bundle objects, some are applicable to additional classes. These methods are designed to be used in a pipe.

kwic()

Perform keyword-in-context (KWIC) analysis.

hits()

Get hits for query

count()

Get counts.

dispersion()

Dispersion of a query or multiple queries.

ngrams()

Get N-Grams

context()

Analyze context of a node word.

cooccurrences()

Get cooccurrence statistics.

features()

Get features by comparison.

as.phrases(<ngrams>) as.phrases(<matrix>) as.character(<phrases>) concatenate_phrases()

Manage and use phrases

Methods II: Processing Objects

The polmineR package offers a repertoire to modify base objects, and for type conversion. Methods for modyfing classes and type conversion.

size()

Get Number of Tokens.

terms(<slice>) terms(<partition>) terms(<subcorpus>) terms(<character>)

Get terms in partition or corpus.

enrich()

Enrich an object.

trim() punctuation

trim an object

noise()

detect noise

encoding() `encoding<-`()

Get and set encoding.

as.TermDocumentMatrix() as.DocumentTermMatrix()

Generate TermDocumentMatrix / DocumentTermMatrix.

as.sparseMatrix()

Type conversion - get sparseMatrix.

as.VCorpus(<partition_bundle>)

Get VCorpus.

chisquare()

Perform chisquare-text.

ll()

Compute Log-likelihood Statistics.

t_test()

Perform t-test.

pmi()

Calculate Pointwise Mutual Information (PMI).

means()

calculate means

weigh()

Apply Weight to Matrix

store() mail() browse()

Defunct functionality

dotplot()

dotplot

restore()

Restore S4 object with data.table slots

Methods III: Fulltext Output

A key consideration of polmineR is to combine quantitative and qualitative steps of analysis in a workflow. A set of methods is exposed to generate and enhance fulltext output.

read()

Display full text.

html()

Generate html from object.

annotations() `annotations<-`() edit(<textstat>)

Annotation functionality

as.markdown()

Get markdown-formatted full text of a partition.

store() mail() browse()

Defunct functionality

view()

Inspect object using View().

highlight()

Highlight tokens in text output.

tooltips()

Add tooltips to text output.

get_template()

Get template for reconstructing full text.

get_type()

Get corpus/partition type.

partition_to_string

Decode as String.

The 'textstat' Class & Childs

The main feature of the textstat class and of classes inheriting from it is that it keeps statistical information about a corpus or partition in a data.table in the field ‘stat’. Methods defined for the textstat superclass are available for its child classes, unless they are overloaded in a specified manner.

name(<textstat>) name(<character>) `name<-`(<textstat>) round(<textstat>) sort(<textstat>) as.bundle() `+`(<textstat>,<textstat>) subset(<textstat>) as.data.table(<textstat>) show(<textstat>) p_attributes(<textstat>) knit_print(<textstat>) get_corpus(<textstat>) format(<textstat>) view(<textstat>)

S4 textstat superclass.

summary(<count>) length(<count>) hist(<count>)

Count class.

ngrams_class

Ngrams class.

p_attributes(<partition>) p_attributes(<subcorpus>) is.partition() enrich(<partition>) as.regions(<partition>) split(<partition>)

Partition class and methods.

sample(<hits>)

Hits class.

length(<context>) p_attributes(<context>) count(<context>) sample(<context>) enrich(<context>) as.regions(<context>) trim(<context>)

Context class.

show(<cooccurrences>) as.data.frame(<cooccurrences_bundle>) format(<cooccurrences>) view(<cooccurrences>) view(<cooccurrences_reshaped>)

Cooccurrences class.

as.simple_triplet_matrix(<Cooccurrences>) as_igraph(<Cooccurrences>) subset(<Cooccurrences>) decode(<Cooccurrences>) kwic(<Cooccurrences>) as.sparseMatrix(<Cooccurrences>) enrich(<Cooccurrences>)

Cooccurrences class for corpus/partition.

Cooccurrences(<corpus>) Cooccurrences(<character>) Cooccurrences(<slice>) Cooccurrences(<partition>) Cooccurrences(<subcorpus>)

Get all cooccurrences in corpus/partition.

summary(<features>) show(<features>) summary(<features_bundle>) format(<features>) view(<features>)

Feature selection by comparison.

The 'kwic' & 'Labels' Classes

xxx

get_corpus(<kwic>) count(<kwic>) as.DocumentTermMatrix(<kwic>) as.TermDocumentMatrix(<kwic>) show(<kwic>) knit_print(<kwic>) as.character(<kwic>) `[`(<kwic>,<ANY>,<ANY>,<ANY>) subset(<kwic>) as.data.frame(<kwic>) length(<kwic>) sample(<kwic>) merge(<kwic_bundle>) enrich(<kwic>) format(<kwic>) view(<kwic>)

S4 kwic class

The 'bundle' Class & Childs

S4 classes for managing text analysis.

`name<-`(<bundle>) length(<bundle>) names(<bundle>) `names<-`(<bundle>,<vector>) unique(<bundle>) `+`(<bundle>,<bundle>) `+`(<bundle>,<textstat>) `[[`(<bundle>) `[[<-`(<bundle>) `$`(<bundle>) `$<-`(<bundle>) sample(<bundle>) as.bundle(<list>) as.bundle(<textstat>) as.data.table(<bundle>) as.matrix(<bundle>) subset(<bundle>) as.list(<bundle>) as.list(<bundle>) get_corpus(<bundle>)

Bundle Class

show(<partition_bundle>) summary(<partition_bundle>) merge(<partition_bundle>) `[`(<partition_bundle>,<ANY>,<ANY>,<ANY>) barplot(<partition_bundle>) as.partition_bundle(<list>) partition_bundle(<environment>) enrich(<partition_bundle>) s_attributes(<partition_bundle>) flatten()

Bundle of partitions (partition_bundle class).

context_bundle-class

S4 context_bundle class

blapply()

apply a function over a list or bundle

Low-level CWB Access

The API used to access CWB indexed corpora is the RcppCWB package. A set of functions in the polmineR package serves as an intermediate interface to access corpus data.

registry_move() registry() data_dir()

Get registry and data directories.

registry_reset()

Reset registry directory.

registry_get_name() registry_get_id() registry_get_home() registry_get_info() registry_get_encoding() registry_get_p_attributes() registry_get_s_attributes() registry_get_properties()

Evaluate registry file.

cpos()

Get corpus positions for a query or queries.

get_token_stream()

Get Token Stream.

decode()

Decode corpus or subcorpus.

is.cqp() check_cqp_query() as.cqp()

Tools for CQP queries.

Utility Functions

Exported utility functions / helpers.

as.utf8() as.nativeEnc() as.corpusEnc()

Conversion between corpus and native encoding.

store() mail() browse()

Defunct functionality

Backwards Compatibility

Starting with v0.7.9, the coding style of the polmineR-package has moved from camelCase to snake_case. A set of ‘old style’-functions ensures backwards compatibility and that old code will not break.

sAttributes() pAttributes() getTokenStream() getTerms() getEncoding() partitionBundle() as.partitionBundle() corpus(<textstat>) corpus(<bundle>) corpus(<kwic>)

Renamed Functions