split                  package:base                  R Documentation

_D_i_v_i_d_e _i_n_t_o _G_r_o_u_p_s _a_n_d _R_e_a_s_s_e_m_b_l_e

_D_e_s_c_r_i_p_t_i_o_n:

     'split' divides the data in the vector 'x' into the groups defined
     by 'f'.  The replacement forms replace values corresponding to
     such a division.  'unsplit' reverses the effect of 'split'.

_U_s_a_g_e:

     split(x, f, drop = FALSE, ...)
     split(x, f, drop = FALSE, ...) <- value
     unsplit(value, f, drop = FALSE)

_A_r_g_u_m_e_n_t_s:

       x: vector or data frame containing values to be divided into
          groups.

       f: a 'factor' in the sense that 'as.factor(f)' defines the
          grouping, or a list of such factors in which case their
          interaction is used for the grouping.

    drop: logical indicating if levels that do not occur should be
          dropped (if 'f' is a 'factor' or a list).

   value: a list of vectors or data frames compatible with a splitting
          of 'x'. Recycling applies if the lengths do not match.

     ...: further potential arguments passed to methods.

_D_e_t_a_i_l_s:

     'split' and 'split<-' are generic functions with default and
     'data.frame' methods. The data frame method can also be used to
     split a matrix into a list of matrices, and the replacement form
     likewise, provided they are invoked explicitly.

     'unsplit' works with lists of vectors or data frames (assumed to
     have compatible structure, as if created by 'split'). It puts
     elements or rows back in the positions given by 'f'. In the data
     frame case, row names are obtained by unsplitting the row name
     vectors from the elements of 'value'.

     'f' is recycled as necessary and if the length of 'x' is not a
     multiple of the length of 'f' a warning is printed.

     Any missing values in 'f' are dropped together with the
     corresponding values of 'x'.

_V_a_l_u_e:

     The value returned from 'split' is a list of vectors containing
     the values for the groups.  The components of the list are named
     by the levels of 'f' (after converting to a factor, or if already
     a factor and 'drop=TRUE', dropping unused levels).

     The replacement forms return their right hand side.  'unsplit'
     returns a vector or data frame for which 'split(x, f)' equals
     'value'

_R_e_f_e_r_e_n_c_e_s:

     Becker, R. A., Chambers, J. M. and Wilks, A. R. (1988) _The New S
     Language_. Wadsworth & Brooks/Cole.

_S_e_e _A_l_s_o:

     'cut'

_E_x_a_m_p_l_e_s:

     require(stats); require(graphics)
     n <- 10; nn <- 100
     g <- factor(round(n * stats::runif(n * nn)))
     x <- rnorm(n * nn) + sqrt(as.numeric(g))
     xg <- split(x, g)
     boxplot(xg, col = "lavender", notch = TRUE, varwidth = TRUE)
     sapply(xg, length)
     sapply(xg, mean)

     ### Calculate z-scores by group

     z <- unsplit(lapply(split(x, g), scale), g)
     tapply(z, g, mean)

     # or

     z <- x
     split(z, g) <- lapply(split(x, g), scale)
     tapply(z, g, sd)

     ### data frame variation

     ## Notice that assignment form is not used since a variable is being added

     g <- airquality$Month
     l <- split(airquality, g)
     l <- lapply(l, transform, Oz.Z = scale(Ozone))
     aq2 <- unsplit(l, g)
     head(aq2)
     with(aq2, tapply(Oz.Z,  Month, sd, na.rm=TRUE))
      

     ### Split a matrix into a list by columns
     ma <- cbind(x = 1:10, y = (-4:5)^2)
     split(ma, col(ma))

     split(1:10, 1:2)

