strsplit                package:base                R Documentation

_S_p_l_i_t _t_h_e _E_l_e_m_e_n_t_s _o_f _a _C_h_a_r_a_c_t_e_r _V_e_c_t_o_r

_D_e_s_c_r_i_p_t_i_o_n:

     Split the elements of a character vector 'x' into substrings
     according to the presence of substring 'split' within them.

_U_s_a_g_e:

     strsplit(x, split, extended = TRUE, fixed = FALSE, perl = FALSE)

_A_r_g_u_m_e_n_t_s:

       x: character vector, each element of which is to be split. 

   split: character vector containing regular expression(s) (unless
          'fixed = TRUE') to use as "split".  If empty matches occur,
          in particular if 'split' has length 0, 'x' is split into
          single characters.  If 'split' has length greater than 1, it
          is re-cycled along 'x'. 

extended: logical. if 'TRUE', extended regular expression matching is
          used, and if 'FALSE' basic regular expressions are used. 

   fixed: logical. If 'TRUE' match string exactly, otherwise use
          regular expressions. 

    perl: logical. Should perl-compatible regexps be used? Has priority
          over 'extended'. 

_D_e_t_a_i_l_s:

     Arguments 'x' and 'split' will be coerced to character, so you
     will see uses with 'split = NULL' to mean 'split = character(0)',
     including in the examples below.

     Note that spltting into single characters can be done via
     'split=character(0)' or 'split=""'; the two are equivalent as from
     R 1.9.0.

     A missing value of 'split' does not split the the corresponding
     element(s) of 'x' at all.

_V_a_l_u_e:

     A list of length 'length(x)' the 'i'-th element of which contains
     the vector of splits of 'x[i]'.

_W_a_r_n_i_n_g:

     The standard regular expression code has been reported to be very
     slow when applied to extremely long character strings (tens of
     thousands of characters or more): the code used when 'perl=TRUE'
     seems much faster and more reliable for such usages.

     The 'perl = TRUE' option is only implemented for singlebyte and
     UTF-8 encodings, and will warn if used in a non-UTF-8 multibyte
     locale.

_S_e_e _A_l_s_o:

     'paste' for the reverse, 'grep' and 'sub' for string search and
     manipulation; further 'nchar', 'substr'.

     regular expression for the details of the pattern specification.

_E_x_a_m_p_l_e_s:

     noquote(strsplit("A text I want to display with spaces", NULL)[[1]])

     x <- c(as = "asfef", qu = "qwerty", "yuiop[", "b", "stuff.blah.yech")
     # split x on the letter e
     strsplit(x,"e")

     unlist(strsplit("a.b.c", "."))
     ## [1] "" "" "" "" ""
     ## Note that 'split' is a regexp!
     ## If you really want to split on '.', use
     unlist(strsplit("a.b.c", "\\."))
     ## [1] "a" "b" "c"
     ## or
     unlist(strsplit("a.b.c", ".", fixed = TRUE))

     ## a useful function: rev() for strings
     strReverse <- function(x)
             sapply(lapply(strsplit(x, NULL), rev), paste, collapse="")
     strReverse(c("abc", "Statistics"))

     ## get the first names of the members of R-core
     a <- readLines(file.path(R.home(),"AUTHORS"))[-(1:8)]
     a <- a[(0:2)-length(a)]
     (a <- sub(" .*","", a))
     # and reverse them
     strReverse(a)

