A set of generic methods is available to extract basic information from objects of the corpus class.

# S4 method for corpus
name(x)

# S4 method for corpus
get_corpus(x)

# S4 method for corpus
show(object)

# S4 method for corpus
$(x, name)

# S4 method for corpus
get_info(x)

# S4 method for corpus
show_info(x)

Arguments

x

An object of class corpus, or inheriting from it.

object

An object of class corpus, or inheriting from it.

name

A (single) s-attribute.

Details

A corpus object can have a name, which can be retrieved using the name-method.

Use get_corpus-method to get the corpus ID from the slot corpus of the corpus object.

The show-method will show basic information on the corpus object.

Applying the `$`-method on a corpus will return the values for the s-attribute stated with argument name.

Use get_info to get the the content of the info file for the corpus (usually in the data directory of the corpus) and return it as a character vector. Returns NULL if there is not info file.

The show_info-method will get the content of the info file for a corpus, turn it into an html document, and show the result in the viewer pane of RStudio. If the filename of the info file ends on "md", the document is rendered as markdown.

Examples

# get/show information on corpora corpus("REUTERS") %>% get_info()
#> [1] "# REUTERS corpus" #> [2] "" #> [3] "#### About" #> [4] "" #> [5] "This is the REUTERS corpus included in the polmineR-package as a demo corpus. The original data is included in the tm package. See the documentation of the encode-method to learn how the CWB-indexed version of the corpus has been generated." #> attr(,"md") #> [1] TRUE
corpus("REUTERS") %>% show_info() corpus("GERMAPARLMINI") %>% get_info()
#> [1] "# GERMAPARLMINI corpus" #> [2] "" #> [3] "## About" #> [4] "" #> [5] "This is an excerpt from the GERMAPARL corpus." #> attr(,"md") #> [1] TRUE
corpus("GERMAPARLMINI") %>% show_info() # show-method if (interactive()) corpus("REUTERS") %>% show() if (interactive()) corpus("REUTERS") # show is called implicitly # get corpus ID corpus("REUTERS") %>% get_corpus()
#> [1] "REUTERS"
# use $ to access s_attributes quickly use("polmineR")
#> ... activating corpus: GERMAPARLMINI (version: 0.0.1 | build date: 2019-02-23)
#> ... activating corpus: REUTERS
g <- corpus("GERMAPARLMINI") g$date
#> [1] "2009-10-27" "2009-10-28" "2009-11-10" "2009-11-11" "2009-11-12"
corpus("GERMAPARLMINI")$date #
#> [1] "2009-10-27" "2009-10-28" "2009-11-10" "2009-11-11" "2009-11-12"
corpus("GERMAPARLMINI") %>% s_attributes(s_attribute = "date") # equivalent
#> [1] "2009-10-27" "2009-10-28" "2009-11-10" "2009-11-11" "2009-11-12"
use("polmineR")
#> ... activating corpus: GERMAPARLMINI (version: 0.0.1 | build date: 2019-02-23)
#> ... activating corpus: REUTERS
sc <- subset("GERMAPARLMINI", date == "2009-10-27") sc$date
#> [1] "2009-10-27"