reshape                package:stats                R Documentation

_R_e_s_h_a_p_e _G_r_o_u_p_e_d _D_a_t_a

_D_e_s_c_r_i_p_t_i_o_n:

     This function reshapes a data frame between 'wide' format with
     repeated measurements in separate columns of the same record and
     'long' format with the repeated measurements in separate records.

_U_s_a_g_e:

     reshape(data, varying = NULL, v.names = NULL, timevar = "time",
             idvar = "id", ids = 1:NROW(data),
             times = seq(length = length(varying[[1]])),
             drop = NULL, direction, new.row.names = NULL,
             split = list(regexp="\.", include=FALSE))

_A_r_g_u_m_e_n_t_s:

    data: a data frame

 varying: names of sets of variables in the wide format that correspond
          to single variables in long format ('time-varying').  A list
          of vectors (or optionally a matrix for 'direction="wide"'). 
          See below for more details and options.

 v.names: names of variables in the long format that correspond to
          multiple variables in the wide format.

 timevar: the variable in long format that differentiates multiple
          records from the same group or individual.

   idvar: the variable in long format that identifies multiple records
          from the same group/individual.  This variable may also be
          present in wide format.

     ids: the values to use for a newly created 'idvar' variable in
          long format.

   times: the values to use for a newly created 'timevar' variable in
          long format.

    drop: a vector of names of variables to drop before reshaping

direction: character string, either '"wide"' to reshape to wide format,
          or '"long"' to reshape to long format.

new.row.names: logical; if 'TRUE' and 'direction="wide"', create new
          row names in long format from the values of the id and time
          variables.

   split: information for guessing the 'varying', 'v.names', and
          'times' arguments.  See below for details.

_D_e_t_a_i_l_s:

     The arguments to this function are described in terms of
     longitudinal data, as that is the application motivating the
     functions.  A 'wide' longitudinal dataset will have one record for
     each individual with some time-constant variables that occupy
     single columns and some time-varying variables that occupy a
     column for each time point.  In 'long' format there will be
     multiple records for each individual, with some variables being
     constant across these records and others varying across the
     records.  A 'long' format dataset also needs a 'time' variable
     identifying which time point each record comes from and an 'id'
     variable showing which records refer to the same person.

     If the data frame resulted from a previous 'reshape' then the
     operation can be reversed by specifying just the 'direction'
     argument.  The other arguments are stored as attributes on the
     data frame.

     If 'direction="long"' and no 'varying' or 'v.names' arguments are
     supplied it is assumed that all variables except 'idvar' and
     'timevar' are time-varying. They are all expanded into multiple
     variables in wide format.

     If 'direction="wide"' the 'varying' argument can be a vector of
     column names or column numbers (converted to column names). The
     function will attempt to guess the 'v.names' and 'times' from
     these names.  The default is variable names like 'x.1',
     'x.2',where 'split=list(regexp="\.",include=FALSE)' to specifies
     to split at the dot and drop it from the name. To have alphabetic
     followed by numeric times use
     'split=list(regexp="[A-Za-z][0-9]",include=TRUE)'. This splits
     between the alphabetic and numeric parts of the name and does not
     drop the regular expression.

_V_a_l_u_e:

     The reshaped data frame with added attributes to simplify
     reshaping back to the original form.

_S_e_e _A_l_s_o:

     'stack', 'aperm'

_E_x_a_m_p_l_e_s:

     data(Indometh)
     summary(Indometh)
     wide <- reshape(Indometh, v.names="conc", idvar="Subject",
                     timevar="time", direction="wide")
     wide

     reshape(wide, direction="long")
     reshape(wide, idvar="Subject", varying=list(names(wide)[2:12]),
             v.names="conc", direction="long")

     ## times need not be numeric
     df <- data.frame(id=rep(1:4,rep(2,4)), visit=I(rep(c("Before","After"),4)),
                   x=rnorm(4), y=runif(4))
     df
     reshape(df, timevar="visit", idvar="id", direction="wide")
     ## warns that y is really varying
     reshape(df, timevar="visit", idvar="id", direction="wide", v.names="x")

     ##  unbalanced 'long' data leads to NA fill in 'wide' form
     df2 <- df[1:7,]
     df2
     reshape(df2, timevar="visit", idvar="id", direction="wide")

     ## Alternative regular expressions for guessing names
     df3 <- data.frame(id=1:4, age=c(40,50,60,50), dose1=c(1,2,1,2),
                       dose2=c(2,1,2,1), dose4=c(3,3,3,3))
     reshape(df3, direction="long", varying=3:5,
             split=list(regexp="[a-z][0-9]", include=TRUE))

     ## an example that isn't longitudinal data
     data(state)
     state.x77 <- as.data.frame(state.x77)
     long <- reshape(state.x77, idvar="state", ids=row.names(state.x77),
                     times=names(state.x77), timevar="Characteristic",
                     varying=list(names(state.x77)), direction="long")

     reshape(long, direction="wide")

     reshape(long, direction="wide", new.row.names=unique(long$state))

