tapply                 package:base                 R Documentation

_A_p_p_l_y _a _F_u_n_c_t_i_o_n _O_v_e_r _a "_R_a_g_g_e_d" _A_r_r_a_y

_D_e_s_c_r_i_p_t_i_o_n:

     Apply a function to each cell of a ragged array, that is to each
     (non-empty) group of values given by a unique combination of the
     levels of certain factors.

_U_s_a_g_e:

     tapply(X, INDEX, FUN = NULL, ..., simplify = TRUE)

_A_r_g_u_m_e_n_t_s:

       X: an atomic object, typically a vector.

   INDEX: list of factors, each of same length as 'X'.  The elements
          are coerced to factors by 'as.factor'.

     FUN: the function to be applied.  In the case of functions like
          '+', '%*%', etc., the function name must be quoted.  If 'FUN'
          is 'NULL', tapply returns a vector which can be used to
          subscript the multi-way array 'tapply' normally produces.

     ...: optional arguments to 'FUN': the Note section.

simplify: If 'FALSE', 'tapply' always returns an array of mode
          '"list"'.  If 'TRUE' (the default), then if 'FUN' always
          returns a scalar, 'tapply' returns an array with the mode of
          the scalar.

_V_a_l_u_e:

     When 'FUN' is present, 'tapply' calls 'FUN' for each cell that has
     any data in it.  If 'FUN' returns a single atomic value for each
     such cell (e.g., functions 'mean' or 'var') and when 'simplify' is
     'TRUE', 'tapply' returns a multi-way array containing the values,
     and 'NA' for the empty cells.  The array has the same number of
     dimensions as 'INDEX' has components; the number of levels in a
     dimension is the number of levels ('nlevels()') in the
     corresponding component of 'INDEX'.  Note that if the return value
     has a class (e.g. an object of class '"Date"') the class is
     discarded.

     Note that contrary to S, 'simplify = TRUE' always returns an
     array, possibly 1-dimensional.

     If 'FUN' does not return a single atomic value, 'tapply' returns
     an array of mode 'list' whose components are the values of the
     individual calls to 'FUN', i.e., the result is a list with a 'dim'
     attribute.

     When there is an array answer, its 'dimnames' are named by the
     names of 'INDEX' and are based on the levels of the grouping
     factors (possibly after coercion).

     For a list result, the elements corresponding to empty cells are
     'NULL'.

_N_o_t_e:

     Optional arguments to 'FUN' supplied by the '...' argument are not
     divided into cells.  It is therefore inappropriate for 'FUN' to
     expect additional arguments with the same length as 'X'.

_R_e_f_e_r_e_n_c_e_s:

     Becker, R. A., Chambers, J. M. and Wilks, A. R. (1988) _The New S
     Language_. Wadsworth & Brooks/Cole.

_S_e_e _A_l_s_o:

     the convenience functions 'by' and 'aggregate' (using 'tapply');
     'apply', 'lapply' with its versions 'sapply' and 'mapply'.

_E_x_a_m_p_l_e_s:

     require(stats)
     groups <- as.factor(rbinom(32, n = 5, prob = 0.4))
     tapply(groups, groups, length) #- is almost the same as
     table(groups)

     ## contingency table from data.frame : array with named dimnames
     tapply(warpbreaks$breaks, warpbreaks[,-1], sum)
     tapply(warpbreaks$breaks, warpbreaks[, 3, drop = FALSE], sum)

     n <- 17; fac <- factor(rep(1:3, length = n), levels = 1:5)
     table(fac)
     tapply(1:n, fac, sum)
     tapply(1:n, fac, sum, simplify = FALSE)
     tapply(1:n, fac, range)
     tapply(1:n, fac, quantile)

     ## example of ... argument: find quarterly means
     tapply(presidents, cycle(presidents), mean, na.rm = TRUE)

     ind <- list(c(1, 2, 2), c("A", "A", "B"))
     table(ind)
     tapply(1:3, ind) #-> the split vector
     tapply(1:3, ind, sum)

