iconv                 package:utils                 R Documentation

_C_o_n_v_e_r_t _C_h_a_r_a_c_t_e_r _V_e_c_t_o_r _b_e_t_w_e_e_n _E_n_c_o_d_i_n_g_s

_D_e_s_c_r_i_p_t_i_o_n:

     This uses system facilities to convert a character vector between
     encodings: the 'i' stands for 'internationalization'.

_U_s_a_g_e:

     iconv(x, from, to, sub=NA)

     iconvlist()

_A_r_g_u_m_e_n_t_s:

       x: A character vector.

    from: A character string describing the current encoding.

      to: A character string describing the target encoding.

     sub: character string.  If not 'NA' it is used to replace any
          non-convertible bytes in the input.  (This would normally be
          a single character, but can be more.  If '"byte"', the
          indication is '"<xx>"' with the hex code of the byte.

_D_e_t_a_i_l_s:

     The names of encodings and which ones are available (and indeed,
     if any are) is platform-dependent.  On systems that support R's
     'iconv' you can use '""' for the encoding of the current locale,
     as well as '"latin1"' and '"UTF-8"'.

     On many platforms 'iconvlist' provides an alphabetical list of the
     supported encodings.  On others, the information is on the man
     page for 'iconv(5)' or elsewhere in the man pages (and beware that
     the system command 'iconv' may not support the same set of
     encodings as the C functions R calls). Unfortunately, the names
     are rarely common across platforms.

     Elements of 'x' which cannot be converted (perhaps because they
     are invalid or because they cannot be represented in the target
     encoding) will be returned as 'NA' unless 'sub' is specified.

     Some versions of 'iconv' will allow transliteration by appending
     '//TRANSLIT' to the 'to' encoding: see the examples.

_V_a_l_u_e:

     A character vector of the same length and the same attributes as
     'x'.

_N_o_t_e:

     Not all platforms support these functions.  See also
     'capabilities("iconv")'.

_S_e_e _A_l_s_o:

     'localeToCharset', 'file'.

_E_x_a_m_p_l_e_s:

     ## Not run: 
     iconvlist()

     ## convert from Latin-2 to UTF-8: two of the glibc iconv variants.
     iconv(x, "ISO_8859-2", "UTF-8")
     iconv(x, "LATIN2", "UTF-8")

     ## Both x below are in latin1 and will only display correctly in a
     ## latin1 locale.
     (x <- "fa\xE7ile")
     charToRaw(xx <- iconv(x, "latin1", "UTF-8"))
     ## in a UTF-8 locale, print(xx)

     iconv(x, "latin1", "ASCII")          #   NA
     iconv(x, "latin1", "ASCII", "?")     # "fa?ile"
     iconv(x, "latin1", "ASCII", "")      # "faile"
     iconv(x, "latin1", "ASCII", "byte")  # "fa<e7>ile"

     # Extracts from R help files
     (x <- c("Ekstr\xf8m", "J\xf6reskog", "bi\xdfchen Z\xfcrcher"))
     iconv(x, "latin1", "ASCII//TRANSLIT")
     iconv(x, "latin1", "ASCII", sub="byte")
     ## End(Not run)

