Working with corpora is much more than counting. But counting words and lexical units is the basic operation for any more complex analysis, and may yield substantial results.
To count is to measure! Is my measurement valid? Are there sufficient safeguards that I do measure what I intend to measure?
Statements about salience means to take the difference between absolute and relative frequencies serious.. Frequencies are the the normalisation of counts by dividing counts by corpus/subcorpus size.
There is a wide variety of scenarios for counting: We will focus on time series analysis and dictionary-based analyses.
Basic methods for counting in the polmineR package are
count
,dispersion
andas.TermDocumentMatrix
. These methods are applicable forcorpus
andsubcorpus
objects. For the following example, we use the corpus of the verbatim records of the UN General Assembly.
library(polmineR) use("UNGA")