Extract information from the internal C representation of registry data.
corpus_data_dir(corpus, registry = Sys.getenv("CORPUS_REGISTRY"))
corpus_info_file(corpus, registry = Sys.getenv("CORPUS_REGISTRY"))
corpus_full_name(corpus, registry = Sys.getenv("CORPUS_REGISTRY"))
corpus_p_attributes(corpus, registry = Sys.getenv("CORPUS_REGISTRY"))
corpus_s_attributes(corpus, registry = Sys.getenv("CORPUS_REGISTRY"))
corpus_properties(corpus, registry = Sys.getenv("CORPUS_REGISTRY"))
corpus_property(corpus, registry = Sys.getenv("CORPUS_REGISTRY"), property)
corpus_registry_dir(corpus)
A length-one character
vector with the corpus ID.
A length-one character
vector with the registry directory.
A corpus property defined in the registry file (.
corpus_data_dir()
will return the data directory (class fs_path
)
where the binary files of a corpus are kept (a directory also known as
'home' directory).
corpus_info_file()
will return the path to the info file for a
corpus (class fs_path
object). If info file does not exist or INFO line
is missing in the registry file, NA
is returned.
corpus_full_name()
will return the full name of the corpus defined
in the registry file.
corpus_p_attributes()
returns a character
vector with the
positional attributes of a corpus.
corpus_s_attributes()
returns a character
vector with the
structural attributes of a corpus.
corpus_properties()
returns a character
vector with the corpus
properties defined in the registry file. If the corpus cannot be located,
NA
is returned.
corpus_property()
returns the value of a corpus property defined
in the registry file, or NA
if the corpus does not exist, is not loaded
of if the property requested is undefined.
corpus_get_registry()
will extract the registry directory with the
registry file defining a corpus from the internal C representation of
loaded corpora. The character
vector that is returned may be > 1 if there
are several corpora with the same id defined in registry files in different
(registry) directories. If the corpus is not found, NA
is returned.
corpus_data_dir("REUTERS", registry = get_tmp_registry())
#> /Users/runner/work/_temp/Library/RcppCWB/extdata/cwb/indexed_corpora/reuters
corpus_info_file("REUTERS", registry = get_tmp_registry())
#> /Users/runner/work/_temp/Library/RcppCWB/extdata/cwb/indexed_corpora/reuters/info.md
corpus_full_name("REUTERS", registry = get_tmp_registry())
#> [1] "Reuters Sample Corpus"
corpus_p_attributes("REUTERS", registry = get_tmp_registry())
#> [1] "word"
corpus_s_attributes("REUTERS", registry = get_tmp_registry())
#> [1] "id" "topics_cat" "places" "language"
corpus_properties("REUTERS", registry = get_tmp_registry())
#> [1] "language" "charset"
corpus_property(
"REUTERS",
registry = get_tmp_registry(),
property = "language"
)
#> [1] "en"
corpus_registry_dir("REUTERS")
#> /private/var/folders/24/8k48jl6d249_n_qfxwsl6xvm0000gn/T/Rtmpk6lOF7/registry_tmp
#> /Users/runner/work/_temp/Library/RcppCWB/extdata/cwb/registry
#> /var/folders/24/8k48jl6d249_n_qfxwsl6xvm0000gn/T/Rtmpk6lOF7/registry_tmp
corpus_registry_dir("FOO") # NA returned
#> [1] NA