charsets                package:tools                R Documentation

_C_o_n_v_e_r_s_i_o_n _T_a_b_l_e_s _b_e_t_w_e_e_n _C_h_a_r_a_c_t_e_r _S_e_t_s

_D_e_s_c_r_i_p_t_i_o_n:

     'charset_to_Unicode' is a matrix of Unicode points with columns
     for the common 8-bit encodings.

     'Adobe_glyphs' is a dataframe which gives Adobe glyph names for
     Unicode points. It has two character columns, '"adobe"' and
     '"unicode"' (a 4-digit hex representation).

_U_s_a_g_e:

     charset_to_Unicode

     Adobe_glyphs

_D_e_t_a_i_l_s:

     'charset_to_Unicode' is an integer matrix of class 'c("noquote",
     "hexmode")' so prints in hexadecimal. The mappings are those used
     by 'libiconv': there are differences in the way quotes and
     minus/hyphen are mapped between sources (and the postscript
     encoding files use a different mapping).

     'Adobe_glyphs' include all the Adobe glyph names which correspond
     to single Unicode characters.  It is sorted by Unicode point and
     within a point alphabetically on the glyph(there can be more than
     one name for a Unicode point).  The data are in the file
     'R_HOME/share/encodings/Adobe_glyphlist'.

_S_o_u_r_c_e:

     <URL:
     http://partners.adobe.com/public/developer/en/opentype/glyphlist.txt>

_E_x_a_m_p_l_e_s:

     ## find Adobe names for ISOLatin2 chars.
     latin2 <- charset_to_Unicode[, "ISOLatin2"]
     aUnicode <- as.numeric(paste("0x", Adobe_glyphs$unicode, sep=""))
     keep <- aUnicode %in% latin2
     aUnicode <- aUnicode[keep]
     aAdobe <- Adobe_glyphs[keep, 1]
     ## first match
     aLatin2 <- aAdobe[match(latin2, aUnicode)]
     ## all matches
     bLatin2 <- lapply(1:256, function(x) aAdobe[aUnicode == latin2[x]])
     format(bLatin2, justify="none")

