Encoding            package:base            R Documentation(latin1)

_R_e_a_d _o_r _S_e_t _t_h_e _D_e_c_l_a_r_e_d _E_n_c_o_d_i_n_g_s _f_o_r _a _C_h_a_r_a_c_t_e_r _V_e_c_t_o_r

_D_e_s_c_r_i_p_t_i_o_n:

     Read or set the declared encodings for a character vector.

_U_s_a_g_e:

     Encoding(x)

     Encoding(x) <- value

_A_r_g_u_m_e_n_t_s:

       x: A character vector.

   value: A character vector of positive length.

_D_e_t_a_i_l_s:

     Character strings in R can be declared to be in '"latin1"' or
     '"UTF-8"'.  These declarations can be read by 'Encoding', which
     will return a character vector of values '"latin1"', '"UTF-8"' or
     '"unknown"', or set, when 'value' is recycled as needed and other
     values are silently treated as '"unknown"'.  As from R 2.8.0,
     ASCII strings will never be marked with a declared encoding, since
     their representation is the same in all encodings.

     There are other ways for character strings to acquire a declared
     encoding apart from explicitly setting it.  Functions 'scan',
     'read.table', 'readLines' and 'parse' have an 'encoding' argument
     that is used to declare encodings, 'iconv' declares encodings from
     its 'from' argument, and console input in suitable locales is also
     declared.  'intToUtf8' declares its output as '"UTF-8"', and
     output text connections are marked if running in a suitable
     locale.

     Most character manipulation functions will set the encoding on
     output strings if it was declared on the corresponding input. 
     These include 'chartr', 'strsplit', 'strtrim', 'substr', 'tolower'
     and 'toupper' as well as 'sub(useBytes = FALSE)' and
     'gsub(useBytes = FALSE)'.  (Also, under some circumstances 'paste'
     will set an encoding.)   Note that such functions do not
     _preserve_ the encoding, but if they know the input encoding and
     that the string has been successfully re-encoded to the current
     encoding, they mark the output with the latter (if it is
     '"latin1"' or '"UTF-8"').

     As from R 2.7.0 'substr' does preserve the encoding, and 'chartr',
     'tolower' and 'toupper' preserve UTF-8 encoding on systems with
     Unicode wide characters.  With their 'fixed' and 'perl' options,
     'strsplit', 'sub' and 'gsub' will give a UTF-8 result if any of
     the inputs are UTF-8.

     As from R 2.8.0 'paste' and 'sprintf' return a UTF-8 encoded
     element if any of the inputs to that element are UTF-8.

_V_a_l_u_e:

     A character vector.

_E_x_a_m_p_l_e_s:

     ## x is intended to be in latin1
     x <- "fa\xE7ile"
     Encoding(x)
     Encoding(x) <- "latin1"
     x
     xx <- iconv(x, "latin1", "UTF-8")
     Encoding(c(x, xx))
     c(x, xx)

