duplicated               package:base               R Documentation

_D_e_t_e_r_m_i_n_e _D_u_p_l_i_c_a_t_e _E_l_e_m_e_n_t_s

_D_e_s_c_r_i_p_t_i_o_n:

     Determines which elements of a vector or data frame are duplicates
     of elements with smaller subscripts, and returns a logical vector
     indicating which elements (rows) are duplicates.

_U_s_a_g_e:

     duplicated(x, incomparables = FALSE, ...)

     ## Default S3 method:
     duplicated(x, incomparables = FALSE,
                fromLast = FALSE, ...)

     ## S3 method for class 'array':
     duplicated(x, incomparables = FALSE, MARGIN = 1,
                fromLast = FALSE, ...)

_A_r_g_u_m_e_n_t_s:

       x: a vector or a data frame or an array or 'NULL'.

incomparables: a vector of values that cannot be compared. 'FALSE' is a
          special value, meaning that all values can be compared, and
          may be the only value accepted for methods other than the
          default.  It will be coerced internally to the same type as
          'x'.

fromLast: logical indicating if duplication should be considered from
          the reverse side, i.e., the last (or rightmost) of identical
          elements would correspond to 'duplicated=FALSE'.

     ...: arguments for particular methods.

  MARGIN: the array margin to be held fixed: see 'apply'.

_D_e_t_a_i_l_s:

     This is a generic function with methods for vectors (including
     lists), data frames and arrays (including matrices).

     'duplicated(x, fromLast=TRUE)' is equivalent to but faster than
     'rev(duplicated(rev(x)))'.

     The data frame method works by pasting together a character
     representation of the rows separated by '\r', so may be imperfect
     if the data frame has characters with embedded carriage returns or
     columns which do not reliably map to characters.

     The array method calculates for each element of the sub-array
     specified by 'MARGIN' if the remaining dimensions are identical to
     those for an earlier (or later, when 'fromLast=TRUE') element (in
     row-major order).  This would most commonly be used to find
     duplicated rows (the default) or columns (with 'MARGIN = 2').

     Missing values are regarded as equal, but 'NaN' is not equal to
     'NA_real_'.

     Values in 'incomparables' will never be marked as duplicated. This
     is intended to be used for a fairly small set of values and will
     not be efficient for a very large set.

_V_a_l_u_e:

     For a vector input, a logical vector of the same length as 'x'. 
     For a data frame, a logical vector with one element for each row. 
     For a matrix or array, a logical array with the same dimensions
     and dimnames.

_W_a_r_n_i_n_g:

     Using this for lists is potentially slow, especially if the
     elements are not atomic vectors (see 'vector') or differ only in
     their attributes.  In the worst case it is O(n^2).

_R_e_f_e_r_e_n_c_e_s:

     Becker, R. A., Chambers, J. M. and Wilks, A. R. (1988) _The New S
     Language_. Wadsworth & Brooks/Cole.

_S_e_e _A_l_s_o:

     'unique'.

_E_x_a_m_p_l_e_s:

     x <- c(9:20, 1:5, 3:7, 0:8)
     ## extract unique elements
     (xu <- x[!duplicated(x)])
     ## similar, but not the same:
     (xu2 <- x[!duplicated(x, fromLast = TRUE)])

     ## xu == unique(x) but unique(x) is more efficient
     stopifnot(identical(xu,  unique(x)),
               identical(xu2, unique(x, fromLast = TRUE)))

     duplicated(iris)[140:143]

     duplicated(iris3, MARGIN = c(1, 3))

