iconv                  package:base                  R Documentation

_C_o_n_v_e_r_t _C_h_a_r_a_c_t_e_r _V_e_c_t_o_r _b_e_t_w_e_e_n _E_n_c_o_d_i_n_g_s

_D_e_s_c_r_i_p_t_i_o_n:

     This uses system facilities to convert a character vector between
     encodings: the 'i' stands for 'internationalization'.

_U_s_a_g_e:

     iconv(x, from ="", to = "", sub = NA)

     iconvlist()

_A_r_g_u_m_e_n_t_s:

       x: A character vector, or an object to be converted to a
          character vector by 'as.character'.

    from: A character string describing the current encoding.

      to: A character string describing the target encoding.

     sub: character string.  If not 'NA' it is used to replace any
          non-convertible bytes in the input.  (This would normally be
          a single character, but can be more.)  If '"byte"', the
          indication is '"<xx>"' with the hex code of the byte.

_D_e_t_a_i_l_s:

     The names of encodings and which ones are available (and indeed,
     if any are) is platform-dependent.  On all systems that support
     'iconv' you can use '""' for the encoding of the current locale,
     as well as '"latin1"' and '"UTF-8"'.

     On many platforms 'iconvlist' provides an alphabetical list of the
     supported encodings.  On others, the information is on the man
     page for 'iconv(5)' or elsewhere in the man pages (and beware that
     the system command 'iconv' may not support the same set of
     encodings as the C functions R calls). Unfortunately, the names
     are rarely common across platforms.

     Elements of 'x' which cannot be converted (perhaps because they
     are invalid or because they cannot be represented in the target
     encoding) will be returned as 'NA' unless 'sub' is specified.

     Most versions of 'iconv' will allow transliteration by appending
     '//TRANSLIT' to the 'to' encoding: see the examples.

     Any encoding bits (see 'Encoding') on elements of 'x' are ignored:
     they will always be translated as if from 'from' even if declared
     otherwise.

     As from R 2.7.0 '"UTF8"' will be accepted as meaning the (more
     correct) '"UTF-8"'.

_V_a_l_u_e:

     A character vector of the same length and the same attributes as
     'x' (after conversion).

     The elements of the result have a declared encoding if 'from' is
     '"latin1"' or '"UTF-8"', or if 'from = ""' and the current
     locale's encoding is detected as Latin-1 or UTF-8.

_N_o_t_e:

     Not all platforms support these functions, although almost all
     support 'iconv'.  See also 'capabilities("iconv")'.

_S_e_e _A_l_s_o:

     'localeToCharset', 'file'.

_E_x_a_m_p_l_e_s:

     utils::head(iconvlist(), n = 50)

     ## Not run: 
     ## convert from Latin-2 to UTF-8: two of the glibc iconv variants.
     iconv(x, "ISO_8859-2", "UTF-8")
     iconv(x, "LATIN2", "UTF-8")
     ## End(Not run)

     ## Both x below are in latin1 and will only display correctly in a
     ## locale that can represent and display latin1.
     x <- "fa\xE7ile"
     Encoding(x) <- "latin1"
     x
     charToRaw(xx <- iconv(x, "latin1", "UTF-8"))
     xx

     iconv(x, "latin1", "ASCII")          #   NA
     iconv(x, "latin1", "ASCII", "?")     # "fa?ile"
     iconv(x, "latin1", "ASCII", "")      # "faile"
     iconv(x, "latin1", "ASCII", "byte")  # "fa<e7>ile"

     # Extracts from R help files
     x <- c("Ekstr\xf8m", "J\xf6reskog", "bi\xdfchen Z\xfcrcher")
     Encoding(x) <- "latin1"
     x
     try(iconv(x, "latin1", "ASCII//TRANSLIT"))  # platform-dependent
     iconv(x, "latin1", "ASCII", sub="byte")

