Get matrix with moving windows. Negative integer values indicate absence of a token at the respective position.
get_cbow_matrix(
corpus,
p_attribute,
registry = Sys.getenv("CORPUS_REGISTRY"),
matrix,
window
)
a CWB corpus
a positional attribute
the registry directory
a matrix
window size
m <- get_region_matrix(
corpus = "REUTERS", s_attribute = "places",
strucs = 0L:5L, registry = get_tmp_registry()
)
windowsize <- 3L
m2 <- get_cbow_matrix(
corpus = "REUTERS", p_attribute = "word",
registry = get_tmp_registry(), matrix = m, window = windowsize
)
colnames(m2) <- c(-windowsize:-1, "node", 1:windowsize)