loglm                  package:MASS                  R Documentation

_F_i_t _L_o_g-_L_i_n_e_a_r _M_o_d_e_l_s _b_y _I_t_e_r_a_t_i_v_e _P_r_o_p_o_r_t_i_o_n_a_l _S_c_a_l_i_n_g

_D_e_s_c_r_i_p_t_i_o_n:

     This function provides a front-end to the standard function,
     'loglin', to allow log-linear models to be specified and fitted in
     a manner similar to that of other fitting functions, such as
     'glm'.

_U_s_a_g_e:

     loglm(formula, data, subset, na.action, ...)

_A_r_g_u_m_e_n_t_s:

 formula: A linear model formula specifying the log-linear model.

          If the left-hand side is empty, the 'data' argument is
          required and must be a (complete) array of frequencies.  In
          this case the variables on the right-hand side may be the
          names of the 'dimnames' attribute of the frequency array, or
          may be the positive integers: 1, 2, 3, ... used as
          alternative names for the 1st, 2nd, 3rd, ... dimension
          (classifying factor). If the left-hand side is not empty it
          specifies a vector of frequencies.  In this case the data
          argument, if present, must be a data frame from which the
          left-hand side vector and the classifying factors on the
          right-hand side are (preferentially) obtained.  The usual
          abbreviation of a '.' to stand for 'all other variables in
          the data frame' is allowed.  Any non-factors on the
          right-hand side of the formula are coerced to factor. 

    data: Numeric array or data frame.  In the first case it specifies
          the array of frequencies; in then second it provides the data
          frame from which the variables occurring in the formula are
          preferentially obtained in the usual way.

          This argument may be the result of a call to 'xtabs'. 

  subset: Specifies a subset of the rows in the data frame to be used. 
          The default is to take all rows. 

na.action: Specifies a method for handling missing observations.  The
          default is to fail if missing values are present. 

     ...: May supply other arguments to the function 'loglm1'. 

_D_e_t_a_i_l_s:

     If the left-hand side of the formula is empty the 'data' argument
     supplies the frequency array and the right-hand side of the
     formula is used to construct the list of fixed faces as required
     by 'loglin'.  Structural zeros may be specified by giving a
     'start' argument with those entries set to zero, as described in
     the help information for 'loglin'.

     If the left-hand side is not empty, all variables on the
     right-hand side are regarded as classifying factors and an array
     of frequencies is constructed.  If some cells in the complete
     array are not specified they are treated as structural zeros. The
     right-hand side of the formula is again used to construct the list
     of faces on which the observed and fitted totals must agree, as
     required by 'loglin'.  Hence terms such as 'a:b', 'a*b' and 'a/b'
     are all equivalent.

_V_a_l_u_e:

     An object of class '"loglm"' conveying the results of the fitted
     log-linear model.  Methods exist for the generic functions
     'print', 'summary', 'deviance', 'fitted', 'coef', 'resid', 'anova'
     and 'update', which perform the expected tasks.  Only
     log-likelihood ratio tests are allowed using 'anova'.

     The deviance is simply an alternative name for the log-likelihood
     ratio statistic for testing the current model within a saturated
     model, in accordance with standard usage in generalized linear
     models.

_W_a_r_n_i_n_g:

     If structural zeros are present, the calculation of degrees of
     freedom may not be correct.  'loglin' itself takes no action to
     allow for structural zeros.  'loglm' deducts one degree of freedom
     for each structural zero, but cannot make allowance for gains in
     error degrees of freedom due to loss of dimension in the model
     space.  (This would require checking the rank of the model matrix,
     but since iterative proportional scaling methods are developed
     largely to avoid constructing the model matrix explicitly, the
     computation is at least difficult.)

     When structural zeros (or zero fitted values) are present the
     estimated coefficients will not be available due to infinite
     estimates.  The deviances will normally continue to be correct,
     though.

_R_e_f_e_r_e_n_c_e_s:

     Venables, W. N. and Ripley, B. D. (2002) _Modern Applied
     Statistics with S._ Fourth edition.  Springer.

_S_e_e _A_l_s_o:

     'loglm1', 'loglin'

_E_x_a_m_p_l_e_s:

     # The data frames  Cars93, minn38 and quine are available
     # in the MASS package.

     # Case 1: frequencies specified as an array.
     sapply(minn38, function(x) length(levels(x)))
     ## hs phs fol sex f
     ##  3   4   7   2 0
     minn38a <- array(0, c(3,4,7,2), lapply(minn38[, -5], levels))
     minn38a[data.matrix(minn38[,-5])] <- minn38$f
     fm <- loglm(~1 + 2 + 3 + 4, minn38a)  # numerals as names.
     deviance(fm)
     ##[1] 3711.9
     fm1 <- update(fm, .~.^2)
     fm2 <- update(fm, .~.^3, print = TRUE)
     ## 5 iterations: deviation 0.0750732
     anova(fm, fm1, fm2)
     ## Not run: LR tests for hierarchical log-linear models

     Model 1:
       ~  1 + 2 + 3 + 4
     Model 2:
      .  ~  1 + 2 + 3 + 4 + 1:2 + 1:3 + 1:4 + 2:3 + 2:4 + 3:4
     Model 3:
      .  ~  1 + 2 + 3 + 4 + 1:2 + 1:3 + 1:4 + 2:3 + 2:4 + 3:4 +
             1:2:3 + 1:2:4 + 1:3:4 + 2:3:4

               Deviance  df Delta(Dev) Delta(df) P(> Delta(Dev)
       Model 1 3711.915 155
       Model 2  220.043 108   3491.873        47        0.00000
       Model 3   47.745  36    172.298        72        0.00000
     Saturated    0.000   0     47.745        36        0.09114

     ## End(Not run)
     # Case 1. An array generated with xtabs.

     loglm(~ Type + Origin, xtabs(~ Type + Origin, Cars93))
     ## Not run: Call:
     loglm(formula = ~Type + Origin, data = xtabs(~Type + Origin,
         Cars93))

     Statistics:
                         X^2 df  P(> X^2)
     Likelihood Ratio 18.362  5 0.0025255
              Pearson 14.080  5 0.0151101

     ## End(Not run)
     # Case 2.  Frequencies given as a vector in a data frame
     names(quine)
     ## [1] "Eth"  "Sex"  "Age"  "Lrn"  "Days"
     fm <- loglm(Days ~ .^2, quine)
     gm <- glm(Days ~ .^2, poisson, quine)  # check glm.
     c(deviance(fm), deviance(gm))          # deviances agree
     ## [1] 1368.7 1368.7
     c(fm$df, gm$df)                        # resid df do not!
     c(fm$df, gm$df.residual)               # resid df do not!
     ## [1] 127 128
     # The loglm residual degrees of freedom is wrong because of
     # a non-detectable redundancy in the model matrix.

