Utility functions to convert encoding between the native encoding and the encoding of the corpus.
as.utf8(x, from) as.nativeEnc(x, from) as.corpusEnc(x, from = localeToCharset()[1], corpusEnc)
x | the object (a character vector) |
---|---|
from | encoding of the input character vector |
corpusEnc | encoding of the corpus (e.g. "latin1", "UTF-8") |
The encoding of a corpus and the encoding of the terminal (the native
encoding) may differ and evoke strange output, or wrong results if no
conversion is carried out between the potentially differing encodings. The
functions as.nativeEnc
and as.corpusEnc
are auxiliary functions
to assist this. The functions as.nativeEnc
and as.utf8
deliberately remove the explicit statement of the encoding, to avoid warnings
that may occur with character vector columns in a data.table
object.