Mind the Neighborhood

September 30, 2019

The Neighborhood matters

“You shall know a word by the company it keeps” (Firth, J. R. 1957: 11)
Methods to analyse collocations / cooccurrences are based on an evaluation of the statistical significance of the occurrence of a term in the context of a node.
The statistical identification of cooccurrences highlights language patterns that deserve attention. Statements about meaning require that the quantitative analysis of text is combined with qualitative methods, i.e. an inspection of concordances.
The following examples are based on the UNGA-corpus.

library(polmineR)
use("UNGA")

Getting cooccurrences

cooccurrences("UNGA", query = 'migration', left = 10, right = 10)

Visualising results: Wordclouds

Filtering results

The subset()-method can be used on cooccurrences-objects to filter results. Here, we use …
- a minimum statistical test value (ll-value of 11.83),
- a minimum number of observations (count_window of at least 5)
- we exclude words from a stoplist.

cooccurrences("UNGA", query = '"[mM]igration"', left = 10, right = 10) %>% 
  subset(ll >= 11.83) %>%
  subset(count_coi >= 5) %>% 
  subset(!tolower(word) %in% tm::stopwords("en")) %>%
  subset(!word %in% c("''", ",", "``"))

Filtered results

From quantity to quality

kwic("UNGA", query = "migration", positivelist = "irregular", left = 10L, right = 10L) %>%
  highlight(yellow = "irregular")

The Neighborhood matters

Getting cooccurrences

Visualising results: Wordclouds

Filtering results

Filtered results

From quantity to quality

Graph annotation