Split entire corpus or a partition into speeches. The heuristic is to split
the corpus/partition into partitions on day-to-day basis first, using the
s-attribute provided by s_attribute_date
. These subcorpora are then
splitted into speeches by speaker name, using s-attribute
s_attribute_name
. If there is a gap larger than the number of tokens
supplied by argument gap
, contributions of a speaker are assumed to be
two seperate speeches.
as.speeches(.Object, ...) # S4 method for partition as.speeches( .Object, s_attribute_date = grep("date", s_attributes(.Object), value = TRUE), s_attribute_name = grep("name", s_attributes(.Object), value = TRUE), gap = 500, mc = FALSE, verbose = TRUE, progress = TRUE ) # S4 method for subcorpus as.speeches( .Object, s_attribute_date = grep("date", s_attributes(.Object), value = TRUE), s_attribute_name = grep("name", s_attributes(.Object), value = TRUE), gap = 500, mc = FALSE, verbose = TRUE, progress = TRUE ) # S4 method for corpus as.speeches( .Object, s_attribute_date = grep("date", s_attributes(.Object), value = TRUE), s_attribute_name = grep("name", s_attributes(.Object), value = TRUE), gap = 500, mc = FALSE, verbose = TRUE, progress = TRUE ) # S4 method for character as.speeches( .Object, s_attribute_date = grep("date", s_attributes(.Object), value = TRUE), s_attribute_name = grep("name", s_attributes(.Object), value = TRUE), gap = 500, mc = FALSE, verbose = TRUE, progress = TRUE )
.Object | A |
---|---|
... | Further arguments. |
s_attribute_date | A length-one |
s_attribute_name | A length-one |
gap | An |
mc | Whether to use multicore, defaults to |
verbose | A |
progress | A |
A partition_bundle
, the names of the objects in the bundle are
the speaker name, the date of the speech and an index for the number of the
speech on a given day, concatenated by underscores.
#>#>speeches <- as.speeches( "GERMAPARLMINI", s_attribute_date = "date", s_attribute_name = "speaker" ) speeches_count <- count(speeches, p_attribute = "word") tdm <- as.TermDocumentMatrix(speeches_count, col = "count")#>#>#>#>#>#>#>#>speeches <- as.speeches(bt, s_attribute_name = "speaker")#>#>#>#>#>#> name size #> 1 Heinz Riesenhuber_2009-10-27_1 4766 #> 2 Volker Kauder_2009-10-27_1 38 #> 3 Norbert Lammert_2009-10-27_1 4441 #> 4 Gerda Hasselfeldt_2009-10-27_1 23 #> 5 Wolfgang Thierse_2009-10-27_1 14 #> 6 Hermann Otto Solms_2009-10-27_1 17 #> 7 Petra Pau_2009-10-27_1 25 #> 8 Katrin Göring-Eckardt_2009-10-27_1 17