Utility functions to convert encoding between the native encoding and the encoding of the corpus.

as.utf8(x, from)

as.nativeEnc(x, from)

as.corpusEnc(x, from = localeToCharset()[1], corpusEnc)

Arguments

x

the object (a character vector)

from

encoding of the input character vector

corpusEnc

encoding of the corpus (e.g. "latin1", "UTF-8")

Details

The encoding of a corpus and the encoding of the terminal (the native encoding) may differ and evoke strange output, or wrong results if no conversion is carried out between the potentially differing encodings. The functions as.nativeEnc and as.corpusEnc are auxiliary functions to assist this. The functions as.nativeEnc and as.utf8 deliberately remove the explicit statement of the encoding, to avoid warnings that may occur with character vector columns in a data.table object.