Get matches for a query in a CQP corpus (subcorpus, partition etc.), optionally using the CQP syntax of the Corpus Workbench (CWB).
cpos(.Object, ...) # S4 method for corpus cpos( .Object, query, p_attribute = getOption("polmineR.p_attribute"), cqp = is.cqp, regex = FALSE, check = TRUE, verbose = TRUE, ... ) # S4 method for character cpos( .Object, query, p_attribute = getOption("polmineR.p_attribute"), cqp = is.cqp, check = TRUE, verbose = TRUE, ... ) # S4 method for slice cpos( .Object, query, cqp = is.cqp, check = TRUE, p_attribute = getOption("polmineR.p_attribute"), verbose = TRUE, ... ) # S4 method for partition cpos( .Object, query, cqp = is.cqp, check = TRUE, p_attribute = getOption("polmineR.p_attribute"), verbose = TRUE, ... ) # S4 method for subcorpus cpos( .Object, query, cqp = is.cqp, check = TRUE, p_attribute = getOption("polmineR.p_attribute"), verbose = TRUE, ... ) # S4 method for matrix cpos(.Object) # S4 method for hits cpos(.Object) # S4 method for `NULL` cpos(.Object)
.Object | A length-one |
---|---|
... | Used for reasons of backwards compatibility to
process arguments that have been renamed (e.g. |
query | A |
p_attribute | The p-attribute to search. Needs to be stated only if query
is not a CQP query. Defaults to |
cqp | Either logical ( |
regex | Interpret |
check | A |
verbose | A |
Unless .Object
is a matrix
, the return value is a
matrix
with two columns. The first column reports the left/starting
corpus positions (cpos) of the hits obtained. The second column reports the
right/ending corpus positions of the respective hit. The number of rows is
the number of hits. If there are no hits, a NULL
object is returned.
If the cpos()
-method is applied on a character
or
partition
object, the result is a two-column matrix
with the
regions (start end end corpus positions of the matches) for a query. CQP
syntax can be used. The encoding of the query is adjusted to conform to the
encoding of the CWB corpus. If there are not matches, NULL
is
returned.
If the cpos()
-method is called on a matrix
object, the cpos
matrix is unfolded, the return value is an integer vector with the individual
corpus positions.
If .Object
is a hits
object, an integer
vector is
returned with the individual corpus positions.
. If .Object
is a matrix
, it is assumed to be a region
matrix, i.e. a two-column matrix
with left and right corpus positions
in the first and second row, respectively. For many operations, such as
decoding the token stream, it is necessary to inflate the denoted regions
into a vector of all corpus positions referred to by the regions defined in
the matrix. The cpos
-method for matrix
objects will performs
this task robustly.
If .Object
is NULL
, the method will return an empty
integer vector. Used internally to handle NULL
objects that may be
returned from the cpos
-method if no matches are obtained for a
query.
#>#># looking up single tokens cpos("REUTERS", query = "oil")#> [,1] [,2] #> [1,] 15 15 #> [2,] 50 50 #> [3,] 57 57 #> [4,] 72 72 #> [5,] 89 89 #> [6,] 119 119 #> [7,] 121 121 #> [8,] 129 129 #> [9,] 163 163 #> [10,] 173 173 #> [11,] 177 177 #> [12,] 200 200 #> [13,] 243 243 #> [14,] 300 300 #> [15,] 370 370 #> [16,] 473 473 #> [17,] 494 494 #> [18,] 549 549 #> [19,] 585 585 #> [20,] 608 608 #> [21,] 773 773 #> [22,] 780 780 #> [23,] 840 840 #> [24,] 1053 1053 #> [25,] 1091 1091 #> [26,] 1183 1183 #> [27,] 1244 1244 #> [28,] 1264 1264 #> [29,] 1294 1294 #> [30,] 1570 1570 #> [31,] 1689 1689 #> [32,] 1704 1704 #> [33,] 1818 1818 #> [34,] 1830 1830 #> [35,] 1953 1953 #> [36,] 2087 2087 #> [37,] 2112 2112 #> [38,] 2165 2165 #> [39,] 2189 2189 #> [40,] 2207 2207 #> [41,] 2295 2295 #> [42,] 2346 2346 #> [43,] 2451 2451 #> [44,] 2498 2498 #> [45,] 2520 2520 #> [46,] 2641 2641 #> [47,] 2785 2785 #> [48,] 2843 2843 #> [49,] 2875 2875 #> [50,] 2892 2892 #> [51,] 2920 2920 #> [52,] 2929 2929 #> [53,] 2984 2984 #> [54,] 3008 3008 #> [55,] 3026 3026 #> [56,] 3053 3053 #> [57,] 3072 3072 #> [58,] 3095 3095 #> [59,] 3144 3144 #> [60,] 3152 3152 #> [61,] 3183 3183 #> [62,] 3211 3211 #> [63,] 3252 3252 #> [64,] 3314 3314 #> [65,] 3319 3319 #> [66,] 3368 3368 #> [67,] 3412 3412 #> [68,] 3463 3463 #> [69,] 3468 3468 #> [70,] 3517 3517 #> [71,] 3585 3585 #> [72,] 3611 3611 #> [73,] 3645 3645 #> [74,] 3710 3710 #> [75,] 3749 3749 #> [76,] 3785 3785 #> [77,] 3835 3835 #> [78,] 3999 3999#> [,1] [,2] #> [1,] 15 15 #> [2,] 50 50 #> [3,] 57 57 #> [4,] 72 72 #> [5,] 89 89 #> [6,] 119 119 #> [7,] 121 121 #> [8,] 129 129 #> [9,] 163 163 #> [10,] 173 173 #> [11,] 177 177 #> [12,] 200 200 #> [13,] 243 243 #> [14,] 300 300 #> [15,] 370 370 #> [16,] 473 473 #> [17,] 494 494 #> [18,] 549 549 #> [19,] 585 585 #> [20,] 608 608 #> [21,] 773 773 #> [22,] 780 780 #> [23,] 840 840 #> [24,] 1053 1053 #> [25,] 1091 1091 #> [26,] 1183 1183 #> [27,] 1244 1244 #> [28,] 1264 1264 #> [29,] 1294 1294 #> [30,] 1570 1570 #> [31,] 1689 1689 #> [32,] 1704 1704 #> [33,] 1818 1818 #> [34,] 1830 1830 #> [35,] 1953 1953 #> [36,] 2087 2087 #> [37,] 2112 2112 #> [38,] 2165 2165 #> [39,] 2189 2189 #> [40,] 2207 2207 #> [41,] 2295 2295 #> [42,] 2346 2346 #> [43,] 2451 2451 #> [44,] 2498 2498 #> [45,] 2520 2520 #> [46,] 2641 2641 #> [47,] 2785 2785 #> [48,] 2843 2843 #> [49,] 2875 2875 #> [50,] 2892 2892 #> [51,] 2920 2920 #> [52,] 2929 2929 #> [53,] 2984 2984 #> [54,] 3008 3008 #> [55,] 3026 3026 #> [56,] 3053 3053 #> [57,] 3072 3072 #> [58,] 3095 3095 #> [59,] 3144 3144 #> [60,] 3152 3152 #> [61,] 3183 3183 #> [62,] 3211 3211 #> [63,] 3252 3252 #> [64,] 3314 3314 #> [65,] 3319 3319 #> [66,] 3368 3368 #> [67,] 3412 3412 #> [68,] 3463 3463 #> [69,] 3468 3468 #> [70,] 3517 3517 #> [71,] 3585 3585 #> [72,] 3611 3611 #> [73,] 3645 3645 #> [74,] 3710 3710 #> [75,] 3749 3749 #> [76,] 3785 3785 #> [77,] 3835 3835 #> [78,] 3999 3999#> [,1] [,2] #> [1,] 1689 1689 #> [2,] 1704 1704 #> [3,] 2165 2165 #> [4,] 2189 2189 #> [5,] 2207 2207 #> [6,] 2295 2295 #> [7,] 2346 2346 #> [8,] 2451 2451 #> [9,] 2498 2498 #> [10,] 2520 2520 #> [11,] 2641 2641 #> [12,] 2785 2785 #> [13,] 2843 2843 #> [14,] 2875 2875 #> [15,] 2892 2892 #> [16,] 2920 2920 #> [17,] 2929 2929 #> [18,] 2984 2984 #> [19,] 3008 3008 #> [20,] 3026 3026 #> [21,] 3053 3053#>#>#> [,1] [,2] #> [1,] 1689 1689 #> [2,] 1704 1704 #> [3,] 2165 2165 #> [4,] 2189 2189 #> [5,] 2207 2207 #> [6,] 2295 2295 #> [7,] 2346 2346 #> [8,] 2451 2451 #> [9,] 2498 2498 #> [10,] 2520 2520 #> [11,] 2641 2641 #> [12,] 2785 2785 #> [13,] 2843 2843 #> [14,] 2875 2875 #> [15,] 2892 2892 #> [16,] 2920 2920 #> [17,] 2929 2929 #> [18,] 2984 2984 #> [19,] 3008 3008 #> [20,] 3026 3026 #> [21,] 3053 3053# using CQP query syntax cpos("REUTERS", query = '"Saudi" "Arabia"')#> [,1] [,2] #> [1,] 2193 2194 #> [2,] 2246 2247 #> [3,] 2614 2615 #> [4,] 2935 2936 #> [5,] 3012 3013 #> [6,] 3036 3037#> [,1] [,2] #> [1,] 2193 2194 #> [2,] 2246 2247 #> [3,] 2614 2615 #> [4,] 2935 2936 #> [5,] 3012 3013 #> [6,] 3036 3037corpus("REUTERS") %>% subset(grepl("saudi-arabia", places)) %>% cpos(query = '"Saudi" "Arabia"', cqp = TRUE)#> [,1] [,2] #> [1,] 2193 2194 #> [2,] 2246 2247 #> [3,] 2614 2615 #> [4,] 2935 2936 #> [5,] 3012 3013 #> [6,] 3036 3037partition("REUTERS", places = "saudi-arabia", regex = TRUE) %>% cpos(query = '"Saudi" "Arabia"', cqp = TRUE)#>#>#> [,1] [,2] #> [1,] 2193 2194 #> [2,] 2246 2247 #> [3,] 2614 2615 #> [4,] 2935 2936 #> [5,] 3012 3013 #> [6,] 3036 3037