grep                  package:base                  R Documentation

_P_a_t_t_e_r_n _M_a_t_c_h_i_n_g _a_n_d _R_e_p_l_a_c_e_m_e_n_t

_D_e_s_c_r_i_p_t_i_o_n:

     'grep' searches for matches to 'pattern' (its first argument)
     within the character vector 'x' (second argument).  'regexpr' does
     too, but returns more detail in a different format.

     'sub' and 'gsub' perform replacement of matches determined by
     regular expression matching.

_U_s_a_g_e:

     grep(pattern, x, ignore.case = FALSE, extended = TRUE, perl = FALSE,
          value = FALSE, fixed = FALSE)
     sub(pattern, replacement, x,
         ignore.case = FALSE, extended = TRUE, perl = FALSE)
     gsub(pattern, replacement, x,
          ignore.case = FALSE, extended = TRUE, perl = FALSE)
     regexpr(pattern, text,  extended = TRUE, perl = FALSE, fixed = FALSE)

_A_r_g_u_m_e_n_t_s:

 pattern: character string containing a regular expression (or
          character string for 'fixed = TRUE') to be matched in the
          given character vector.

 x, text: a character vector where matches are sought.

ignore.case: if 'FALSE', the pattern matching is _case sensitive_ and
          if 'TRUE', case is ignored during matching.

extended: if 'TRUE', extended regular expression matching is used, and
          if 'FALSE' basic regular expressions are used.

    perl: logical. Should perl-compatible regexps be used? Has priority
          over 'extended'.

   value: if 'FALSE', a vector containing the ('integer') indices of
          the matches determined by 'grep' is returned, and if 'TRUE',
          a vector containing the matching elements themselves is
          returned.

   fixed: logical.  If 'TRUE', 'pattern' is a string to be matched as
          is.  Overrides all other arguments.

replacement: a replacement for matched pattern in 'sub' and 'gsub'.

_D_e_t_a_i_l_s:

     Arguments which should be character strings or character vectors
     are coerced to character if possible.

     The two '*sub' functions differ only in that 'sub' replaces only
     the first occurrence of a 'pattern' whereas 'gsub' replaces all
     occurrences.

     For 'regexpr' it is an error for 'pattern' to be 'NA', otherwise
     'NA' is permitted and matches only itself.

     The regular expressions used are those specified by POSIX 1003.2,
     either extended or basic, depending on the value of the 'extended'
     argument, unless 'perl = TRUE' when they are those of PCRE, <URL:
     ftp://ftp.csx.cam.ac.uk/pub/software/programming/pcre/>. (The
     exact set of patterns supported may depend on the version of PCRE
     installed on the system in use.)

_V_a_l_u_e:

     For 'grep' a vector giving either the indices of the elements of
     'x' that yielded a match or, if 'value' is 'TRUE', the matched
     elements.

     For 'sub' and 'gsub' a character vector of the same length as the
     original.

     For 'regexpr' an integer vector of the same length as 'text'
     giving the starting position of the first match, or -1 if there is
     none, with attribute '"match.length"' giving the length of the
     matched text (or -1 for no match).

_W_a_r_n_i_n_g:

     The standard regular-expression code has been reported to be very
     slow or give errors when applied to extremely long character
     strings (tens of thousands of characters or more): the code used
     when 'perl=TRUE' seems faster and more reliable for such usages.

_R_e_f_e_r_e_n_c_e_s:

     Becker, R. A., Chambers, J. M. and Wilks, A. R. (1988) _The New S
     Language_. Wadsworth & Brooks/Cole ('grep')

_S_e_e _A_l_s_o:

     regular expression (aka 'regexp') for the details of the pattern
     specification.

     'agrep' for approximate matching.

     'tolower', 'toupper' and 'chartr' for character translations.
     'charmatch', 'pmatch', 'match'. 'apropos' uses regexps and has
     nice examples.

_E_x_a_m_p_l_e_s:

     grep("[a-z]", letters)

     txt <- c("arm","foot","lefroo", "bafoobar")
     if(any(i <- grep("foo",txt)))
        cat("'foo' appears at least once in\n\t",txt,"\n")
     i # 2 and 4
     txt[i]

     ## Double all 'a' or 'b's;  "\" must be escaped, i.e., 'doubled'
     gsub("([ab])", "\\1_\\1_", "abc and ABC")

     txt <- c("The", "licenses", "for", "most", "software", "are",
       "designed", "to", "take", "away", "your", "freedom",
       "to", "share", "and", "change", "it.",
        "", "By", "contrast,", "the", "GNU", "General", "Public", "License",
        "is", "intended", "to", "guarantee", "your", "freedom", "to",
        "share", "and", "change", "free", "software", "--",
        "to", "make", "sure", "the", "software", "is",
        "free", "for", "all", "its", "users")
     ( i <- grep("[gu]", txt) ) # indices
     stopifnot( txt[i] == grep("[gu]", txt, value = TRUE) )
     (ot <- sub("[b-e]",".", txt))
     txt[ot != gsub("[b-e]",".", txt)]#- gsub does "global" substitution

     txt[gsub("g","#", txt) !=
         gsub("g","#", txt, ignore.case = TRUE)] # the "G" words

     regexpr("en", txt)

     ## trim trailing white space
     str = 'Now is the time      '
     sub(' +$', '', str)  ## spaces only
     sub('[[:space:]]+$', '', str) ## white space, POSIX-style
     sub('\\s+$', '', str, perl = TRUE) ## Perl-style white space

