Structural annotations (s-attributes) of a corpus capture metainformation for regions of tokens. The s_attributes-method offers high-level access to the s-attributes present in a corpus or subcorpus, or the values of s-attributes in a corpus/partition.

s_attributes(.Object, ...)

# S4 method for character
s_attributes(.Object, s_attribute = NULL, unique = TRUE, regex = NULL, ...)

# S4 method for corpus
s_attributes(.Object, s_attribute = NULL, unique = TRUE, regex = NULL, ...)

# S4 method for slice
s_attributes(.Object, s_attribute = NULL, unique = TRUE, ...)

# S4 method for partition
s_attributes(.Object, s_attribute = NULL, unique = TRUE, ...)

# S4 method for subcorpus
s_attributes(.Object, s_attribute = NULL, unique = TRUE, ...)

# S4 method for call
s_attributes(.Object, corpus)

# S4 method for remote_corpus
s_attributes(.Object, ...)

# S4 method for remote_partition
s_attributes(.Object, ...)

Arguments

.Object

A corpus, subcorpus, partition object, or a call. A corpus can also be specified by a length-one character vector.

...

To maintain backward compatibility, if argument sAttribute (deprecated) is used. If .Object is a remote_corpus or remote_subcorpus object, the three dots (...) are used to pass arguments. Hence, it is necessary to state the names of all arguments to be passed explicity.

s_attribute

The name of a specific s-attribute.

unique

A logical value, whether to return unique values.

regex

A regular expression passed into grep to filter return value by applying a regex.

corpus

A corpus-object or a length one character vector denoting a corpus.

Value

A character vector (s-attributes, or values of s-attributes).

Details

Importing XML into the Corpus Workbench (CWB) turns elements and element attributes into so-called "s-attributes". There are two basic uses of the s_attributes-method: If the argument s_attribute is NULL (default), the return value is a character vector with all s-attributes present in a corpus.

If s_attribute is the name of a specific s-attribute (a length-one character vector), the values of the s-attributes available in the corpus/partition are returned.

If argument unique is FALSE, the full sequence of the s_attributes is returned, which is a useful building block for decoding a corpus.

If argument s_attributes is a character providing several s-attributes, the method will return a data.table. If unique is TRUE, all unique combinations of the s-attributes will be reported by the data.table.

If .Object is a call, the s_attributes-method will return a character vector with the s-attributes occurring in the call. This usage is relevant internally to implement the subset method to generate a subcorpus using non-standard evaluation. Usually it will not be relevant in an interactive session.

Examples

use("polmineR")
#> ... activating corpus: GERMAPARLMINI (version: 0.0.1 | build date: 2019-02-23)
#> ... activating corpus: REUTERS
s_attributes("GERMAPARLMINI")
#> [1] "interjection" "date" "party" "speaker"
s_attributes("GERMAPARLMINI", "date") # dates of plenary meetings
#> [1] "2009-10-27" "2009-10-28" "2009-11-10" "2009-11-11" "2009-11-12"
s_attributes("GERMAPARLMINI", s_attribute = c("date", "party"))
#> date party #> 1: 2009-10-27 NA #> 2: 2009-10-27 CDU_CSU #> 3: 2009-10-27 SPD #> 4: 2009-10-27 FDP #> 5: 2009-10-27 DIE_LINKE #> 6: 2009-10-27 B90_DIE_GRUENEN #> 7: 2009-10-28 NA #> 8: 2009-10-28 CDU_CSU #> 9: 2009-10-28 FDP #> 10: 2009-11-10 NA #> 11: 2009-11-10 CDU_CSU #> 12: 2009-11-10 SPD #> 13: 2009-11-10 FDP #> 14: 2009-11-10 DIE_LINKE #> 15: 2009-11-10 B90_DIE_GRUENEN #> 16: 2009-11-11 NA #> 17: 2009-11-11 FDP #> 18: 2009-11-11 SPD #> 19: 2009-11-11 CDU_CSU #> 20: 2009-11-11 DIE_LINKE #> 21: 2009-11-11 B90_DIE_GRUENEN #> 22: 2009-11-12 NA #> 23: 2009-11-12 FDP #> 24: 2009-11-12 SPD #> 25: 2009-11-12 CDU_CSU #> 26: 2009-11-12 B90_DIE_GRUENEN #> 27: 2009-11-12 DIE_LINKE #> date party
s_attributes(corpus("GERMAPARLMINI"))
#> [1] "interjection" "date" "party" "speaker"
p <- partition("GERMAPARLMINI", date = "2009-11-10")
#> ... get encoding: latin1
#> ... get cpos and strucs
s_attributes(p)
#> [1] "interjection" "date" "party" "speaker"
s_attributes(p, "speaker") # get names of speakers
#> [1] "Norbert Lammert" "Angela Dorothea Merkel" #> [3] "Frank-Walter Steinmeier" "Birgit Homburger" #> [5] "Wolfgang Thierse" "Oskar Lafontaine" #> [7] "Jürgen Trittin" "Volker Kauder" #> [9] "Joachim Poß" "Hans-Peter Friedrich" #> [11] "Agnes Krumwiede" "Gerda Hasselfeldt" #> [13] "Arnold Vaatz" "Guido Westerwelle" #> [15] "Gernot Erler" "Andreas Schockenhoff" #> [17] "Jan van Aken" "Frithjof Schmidt" #> [19] "Dirk Niebel" "Sascha Raabe" #> [21] "Angelica Schwall-Düren" "Karl-Theodor zu Guttenberg" #> [23] "Wolfgang Gehrcke" "Omid Nouripour" #> [25] "Petra Pau" "Ilse Aigner" #> [27] "Waltraud Wolff" "Hans-Michael Goldmann" #> [29] "Katrin Göring-Eckardt" "Kirsten Tackmann" #> [31] "Ulrike Höfken" "Peter Bleser" #> [33] "Elvira Drobinski-Weiß" "Christel Happach-Kasan" #> [35] "Caren Lay" "Johannes Röring" #> [37] "Wilhelm Priesmeier"
# Get s-attributes occurring in a call s_attributes(quote(grep("Merkel", speaker)), corpus = "GERMAPARLMINI")
#> [1] "speaker"
s_attributes(quote(speaker == "Angela Merkel"), corpus = "GERMAPARLMINI")
#> [1] "speaker"
s_attributes(quote(speaker != "Angela Merkel"), corpus = "GERMAPARLMINI")
#> [1] "speaker"
s_attributes( quote(speaker == "Angela Merkel" & date == "2009-10-28"), corpus = "GERMAPARLMINI" )
#> [1] "speaker" "date"