formula                package:stats                R Documentation

_M_o_d_e_l _F_o_r_m_u_l_a_e

_D_e_s_c_r_i_p_t_i_o_n:

     The generic function 'formula' and its specific methods provide a
     way of extracting formulae which have been included in other
     objects.

     'as.formula' is almost identical, additionally preserving
     attributes when 'object' already inherits from '"formula"'.  The
     default value of the 'env' argument is used only when the formula
     would otherwise lack an environment.

_U_s_a_g_e:

     formula(x, ...)
     as.formula(object, env = parent.frame())

_A_r_g_u_m_e_n_t_s:

x, object: R object.

     ...: further arguments passed to or from other methods.

     env: the environment to associate with the result.

_D_e_t_a_i_l_s:

     The models fit by, e.g., the 'lm' and 'glm' functions are
     specified in a compact symbolic form. The '~' operator is basic in
     the formation of such models. An expression of the form 'y ~
     model' is interpreted as a specification that the response 'y' is
     modelled by a linear predictor specified symbolically by 'model'.
     Such a model consists of a series of terms separated by '+'
     operators. The terms themselves consist of variable and factor
     names separated by ':' operators. Such a term is interpreted as
     the interaction of all the variables and factors appearing in the
     term.

     In addition to '+' and ':', a number of other operators are useful
     in model formulae.  The '*' operator denotes factor crossing:
     'a*b' interpreted as 'a+b+a:b'.  The '^' operator indicates
     crossing to the specified degree.  For example '(a+b+c)^2' is
     identical to '(a+b+c)*(a+b+c)' which in turn expands to a formula
     containing the main effects for 'a', 'b' and 'c' together with
     their second-order interactions. The '%in%' operator indicates
     that the terms on its left are nested within those on the right. 
     For example 'a + b %in% a' expands to the formula 'a + a:b'.  The
     '-' operator removes the specified terms, so that '(a+b+c)^2 -
     a:b' is identical to 'a + b + c + b:c + a:c'.  It can also used to
     remove the intercept term: 'y ~ x - 1' is a line through the
     origin.  A model with no intercept can be also specified as 'y ~ x
     + 0' or 'y ~ 0 + x'.

     While formulae usually involve just variable and factor names,
     they can also involve arithmetic expressions. The formula 'log(y)
     ~ a + log(x)' is quite legal. When such arithmetic expressions
     involve operators which are also used symbolically in model
     formulae, there can be confusion between arithmetic and symbolic
     operator use.

     To avoid this confusion, the function 'I()' can be used to bracket
     those portions of a model formula where the operators are used in
     their arithmetic sense.  For example, in the formula 'y ~ a +
     I(b+c)', the term 'b+c' is to be interpreted as the sum of 'b' and
     'c'.

     Variable names can be quoted by backticks '`like this`' in
     formulae, although there is no guarantee that all code using
     formulae will accept such non-syntactic names.

     Most model-fitting functions accept formulae with right-hand-side
     including the function 'offset' to indicate terms with a fixed
     coefficient of one.  Some functions accept other 'specials' such
     as 'strata' or 'cluster' (see the 'specials' argument of
     'terms.formula)'.

     There are two special interpretations of '.' in a formula.  The
     usual one is in the context of a 'data' argument of model fitting
     functions and means 'all columns not otherwise in the formula':
     see 'terms.formula'.  In the context of 'update.formula', *only*,
     it means 'what was previously in this part of the formula'.

     When 'formula' is called on a fitted model object, either a
     specific method is used (such as that for class '"nls"') or the
     default method.  The default first looks for a '"formula"'
     component of the object (and evaluates it), then a '"terms"'
     component, then a 'formula' parameter of the call (and evaluates
     its value) and finally a '"formula"' attribute.

     There is a method for data frames.  If there is only one column
     this forms the RHS with an empty LHS.   For more columns, the
     first column is the LHS of the formula and the remaining columns
     separated by '+' form the RHS.

_V_a_l_u_e:

     All the functions above produce an object of class '"formula"'
     which contains a symbolic model formula.

_E_n_v_i_r_o_n_m_e_n_t_s:

     A formula object has an associated environment, and this
     environment (rather than the parent environment) is used by
     'model.frame' to evaluate variables that are not found in the
     supplied 'data' argument.

     Formulas created with the '~' operator use the environment in
     which they were created.  Formulas created with 'as.formula' will
     use the 'env' argument for their environment.  Pre-existing
     formulas extracted with 'as.formula' will only have their
     environment changed if 'env' is given explicitly.

_R_e_f_e_r_e_n_c_e_s:

     Chambers, J. M. and Hastie, T. J. (1992) _Statistical models._
     Chapter 2 of _Statistical Models in S_ eds J. M. Chambers and T.
     J. Hastie, Wadsworth & Brooks/Cole.

_S_e_e _A_l_s_o:

     'I', 'offset'.

     For formula manipulation: 'terms', and 'all.vars'; for typical
     use: 'lm', 'glm', and 'coplot'.

_E_x_a_m_p_l_e_s:

     class(fo <- y ~ x1*x2) # "formula"
     fo
     typeof(fo)# R internal : "language"
     terms(fo)

     environment(fo)
     environment(as.formula("y ~ x"))
     environment(as.formula("y ~ x", env=new.env()))

     ## Create a formula for a model with a large number of variables:
     xnam <- paste("x", 1:25, sep="")
     (fmla <- as.formula(paste("y ~ ", paste(xnam, collapse= "+"))))

