Extract information from the internal C representation of registry data.

corpus_data_dir(corpus, registry = Sys.getenv("CORPUS_REGISTRY"))

corpus_info_file(corpus, registry = Sys.getenv("CORPUS_REGISTRY"))

corpus_full_name(corpus, registry = Sys.getenv("CORPUS_REGISTRY"))

corpus_p_attributes(corpus, registry = Sys.getenv("CORPUS_REGISTRY"))

corpus_s_attributes(corpus, registry = Sys.getenv("CORPUS_REGISTRY"))

corpus_properties(corpus, registry = Sys.getenv("CORPUS_REGISTRY"))

corpus_property(corpus, registry = Sys.getenv("CORPUS_REGISTRY"), property)

corpus_registry_dir(corpus)

Arguments

corpus

A length-one character vector with the corpus ID.

registry

A length-one character vector with the registry directory.

property

A corpus property defined in the registry file (.

Details

corpus_data_dir() will return the data directory (class fs_path) where the binary files of a corpus are kept (a directory also known as 'home' directory).

corpus_info_file() will return the path to the info file for a corpus (class fs_path object). If info file does not exist or INFO line is missing in the registry file, NA is returned.

corpus_full_name() will return the full name of the corpus defined in the registry file.

corpus_p_attributes() returns a character vector with the positional attributes of a corpus.

corpus_s_attributes() returns a character vector with the structural attributes of a corpus.

corpus_properties() returns a character vector with the corpus properties defined in the registry file. If the corpus cannot be located, NA is returned.

corpus_property() returns the value of a corpus property defined in the registry file, or NA if the corpus does not exist, is not loaded of if the property requested is undefined.

corpus_get_registry() will extract the registry directory with the registry file defining a corpus from the internal C representation of loaded corpora. The character vector that is returned may be > 1 if there are several corpora with the same id defined in registry files in different (registry) directories. If the corpus is not found, NA is returned.

Examples

corpus_data_dir("REUTERS", registry = get_tmp_registry())
#> /Users/runner/work/_temp/Library/RcppCWB/extdata/cwb/indexed_corpora/reuters
corpus_info_file("REUTERS", registry = get_tmp_registry())
#> /Users/runner/work/_temp/Library/RcppCWB/extdata/cwb/indexed_corpora/reuters/info.md
corpus_full_name("REUTERS", registry = get_tmp_registry())
#> [1] "Reuters Sample Corpus"
corpus_p_attributes("REUTERS", registry = get_tmp_registry())
#> [1] "word"
corpus_s_attributes("REUTERS", registry = get_tmp_registry())
#> [1] "id"         "topics_cat" "places"     "language"  
corpus_properties("REUTERS", registry = get_tmp_registry())
#> [1] "language" "charset" 
corpus_property(
  "REUTERS",
  registry = get_tmp_registry(),
  property = "language"
)
#> [1] "en"
corpus_registry_dir("REUTERS")
#> /private/var/folders/24/8k48jl6d249_n_qfxwsl6xvm0000gn/T/Rtmpk6lOF7/registry_tmp
#> /Users/runner/work/_temp/Library/RcppCWB/extdata/cwb/registry
#> /var/folders/24/8k48jl6d249_n_qfxwsl6xvm0000gn/T/Rtmpk6lOF7/registry_tmp
corpus_registry_dir("FOO") # NA returned
#> [1] NA