Character sets supported by CWB — cwb

The function returns a character vector with characters sets (charsets) supported by the Corpus Workbench (CWB). The vector is derived from the the CorpusCharset object defined in the header file of the corpus library (CL).

cwb_charsets()

Details

Early versions of the CWB were developed for "latin1", "utf8" support has been introduced with CWB v3.2. Note that RcppCWB is tested only for "latin1" and "utf8" and that R uses "UTF-8" rather than utf8" (CWB) by convention.

Examples

cwb_charsets()
#>  [1] "ascii"    "latin1"   "latin2"   "latin3"   "latin4"   "cyrillic"
#>  [7] "arabic"   "greek"    "hebrew"   "latin5"   "latin6"   "latin7"  
#> [13] "latin8"   "latin9"   "utf8"