A dataset with information on the corpus by legislative period is included in the package to be included in the data report of the package vignette.

germaparl_by_lp

Format

A data.frame with 5 rows and 6 variables with summary statistics on the GermaParl corpus on a year-by-year basis.

lp

legislative period (integer value)

protocols

total number of protocols included in the corpus for the respective legislative period (integer value)

first

date of the first plenary protocol in the legislative period (Date class)

last

date of the last plenary protocol in the legislative period (Date class)

size

number of tokens in subcorpus for the respective legislative period (integer value)

unknown_total

total number of words that cannot be lemmatized, resulting in #unknown# tag (numeric value)

unknown_share

share of words that cannot be lemmatized, resulting in #unknown# tag (numeric value)

The table is based on v1.0.6 of the corpus. To prepare the table, the script available at data-raw/stats_for_vignette.R has been used.

Value

A data.frame.