factanal            package:stats            R Documentation(latin1)

_F_a_c_t_o_r _A_n_a_l_y_s_i_s

_D_e_s_c_r_i_p_t_i_o_n:

     Perform maximum-likelihood factor analysis on a covariance matrix
     or data matrix.

_U_s_a_g_e:

     factanal(x, factors, data = NULL, covmat = NULL, n.obs = NA,
              subset, na.action, start = NULL,
              scores = c("none", "regression", "Bartlett"),
              rotation = "varimax", control = NULL, ...)

_A_r_g_u_m_e_n_t_s:

       x: A formula or a numeric matrix or an object that can be
          coerced to a numeric matrix.

 factors: The number of factors to be fitted.

    data: An optional data frame (or similar: see 'model.frame'), used
          only if 'x' is a formula.  By default the variables are taken
          from 'environment(formula)'.

  covmat: A covariance matrix, or a covariance list as returned by
          'cov.wt'.  Of course, correlation matrices are covariance
          matrices.

   n.obs: The number of observations, used if 'covmat' is a covariance
          matrix.

  subset: A specification of the cases to be used, if 'x' is used as a
          matrix or formula.

na.action: The 'na.action' to be used if 'x' is used as a formula.

   start: 'NULL' or a matrix of starting values, each column giving an
          initial set of uniquenesses.

  scores: Type of scores to produce, if any.  The default is none,
          '"regression"' gives Thompson's scores, '"Bartlett"' given
          Bartlett's weighted least-squares scores. Partial matching
          allows these names to be abbreviated.

rotation: character. '"none"' or the name of a function to be used to
          rotate the factors: it will be called with first argument the
          loadings matrix, and should return a list with component
          'loadings' giving the rotated loadings, or just the rotated
          loadings.

 control: A list of control values,

          _n_s_t_a_r_t The number of starting values to be tried if 'start =
               NULL'. Default 1.

          _t_r_a_c_e logical. Output tracing information? Default 'FALSE'.

          _l_o_w_e_r The lower bound for uniquenesses during optimization.
               Should be > 0. Default 0.005.

          _o_p_t A list of control values to be passed to 'optim''s
               'control' argument.

          _r_o_t_a_t_e a list of additional arguments for the rotation
               function.

     ...: Components of 'control' can also be supplied as named
          arguments to 'factanal'.

_D_e_t_a_i_l_s:

     The factor analysis model is

                           x = Lambda f + e

     for a p-element row-vector x, a p x k matrix of _loadings_, a
     k-element vector of _scores_ and a p-element vector of errors. 
     None of the components other than x is observed, but the major
     restriction is that the scores be uncorrelated and of unit
     variance, and that the errors be independent with variances Phi,
     the _uniquenesses_.  Thus factor analysis is in essence a model
     for the covariance matrix of x,

                     Sigma = Lambda'Lambda + Psi

     There is still some indeterminacy in the model for it is unchanged
     if Lambda is replaced by G Lambda for any orthogonal matrix G. 
     Such matrices G are known as _rotations_ (although the term is
     applied also to non-orthogonal invertible matrices).

     If 'covmat' is supplied it is used.  Otherwise 'x' is used if it
     is a matrix, or a formula 'x' is used with 'data' to construct a
     model matrix, and that is used to construct a covariance matrix. 
     (It makes no sense for the formula to have a response, and all the
     variables must be numeric.)  Once a covariance matrix is found or
     calculated from 'x', it is converted to a correlation matrix for
     analysis.  The correlation matrix is returned as component
     'correlation' of the result.

     The fit is done by optimizing the log likelihood assuming
     multivariate normality over the uniquenesses.  (The maximizing
     loadings for given uniquenesses can be found analytically: Lawley
     & Maxwell (1971, p. 27).)  All the starting values supplied in
     'start' are tried in turn and the best fit obtained is used.  If
     'start = NULL' then the first fit is started at the value
     suggested by Joreskog (1963) and given by Lawley & Maxwell (1971,
     p. 31), and then 'control$nstart - 1' other values are tried,
     randomly selected as equal values of the uniquenesses.

     The uniquenesses are technically constrained to lie in [0, 1], but
     near-zero values are problematical, and the optimization is done
     with a lower bound of 'control$lower', default 0.005 (Lawley &
     Maxwell, 1971, p. 32).

     Scores can only be produced if a data matrix is supplied and used.
     The first method is the regression method of Thomson (1951), the
     second the weighted least squares method of Bartlett (1937, 8).
     Both are estimates of the unobserved scores f.  Thomson's method
     regresses (in the population) the unknown f on x to yield

                      hat f = Lambda' Sigma^-1 x

     and then substitutes the sample estimates of the quantities on the
     right-hand side.  Bartlett's method minimizes the sum of squares
     of standardized errors over the choice of f, given (the fitted)
     Lambda.

     If 'x' is a formula then the standard NA-handling is applied to
     the scores (if requested): see 'napredict'.

_V_a_l_u_e:

     An object of class '"factanal"' with components 

loadings: A matrix of loadings, one column for each factor.  The
          factors are ordered in decreasing order of sums of squares of
          loadings, and given the sign that will make the sum of the
          loadings positive.

uniquenesses: The uniquenesses computed.

correlation: The correlation matrix used.

criteria: The results of the optimization: the value of the negative
          log-likelihood and information on the iterations used.

 factors: The argument 'factors'.

     dof: The number of degrees of freedom of the factor analysis
          model.

  method: The method: always '"mle"'.

  scores: If requested, a matrix of scores.  'napredict' is applied to
          handle the treatment of values omitted by the 'na.action'.

   n.obs: The number of observations if available, or 'NA'.

    call: The matched call.

na.action: If relevant.

STATISTIC, PVAL: The significance-test statistic and P value, if if can
          be computed.

_N_o_t_e:

     There are so many variations on factor analysis that it is hard to
     compare output from different programs.  Further, the optimization
     in maximum likelihood factor analysis is hard, and many other
     examples we compared had less good fits than produced by this
     function.  In particular, solutions which are Heywood cases (with
     one or more uniquenesses essentially zero) are much often common
     than most texts and some other programs would lead one to believe.

_R_e_f_e_r_e_n_c_e_s:

     Bartlett, M. S. (1937) The statistical conception of mental
     factors. _British Journal of Psychology_, *28*, 97-104.

     Bartlett, M. S. (1938) Methods of estimating mental factors.
     _Nature_, *141*, 609-610.

     Joreskog, K. G. (1963) _Statistical Estimation in Factor
     Analysis._  Almqvist and Wicksell.

     Lawley, D. N. and Maxwell, A. E. (1971) _Factor Analysis as a
     Statistical Method._ Second edition. Butterworths.

     Thomson, G. H. (1951) _The Factorial Analysis of Human Ability._
     London University Press.

_S_e_e _A_l_s_o:

     'print.loadings', 'varimax', 'princomp', 'ability.cov',
     'Harman23.cor', 'Harman74.cor'

_E_x_a_m_p_l_e_s:

     # A little demonstration, v2 is just v1 with noise,
     # and same for v4 vs. v3 and v6 vs. v5
     # Last four cases are there to add noise
     # and introduce a positive manifold (g factor)
     v1 <- c(1,1,1,1,1,1,1,1,1,1,3,3,3,3,3,4,5,6)
     v2 <- c(1,2,1,1,1,1,2,1,2,1,3,4,3,3,3,4,6,5)
     v3 <- c(3,3,3,3,3,1,1,1,1,1,1,1,1,1,1,5,4,6)
     v4 <- c(3,3,4,3,3,1,1,2,1,1,1,1,2,1,1,5,6,4)
     v5 <- c(1,1,1,1,1,3,3,3,3,3,1,1,1,1,1,6,4,5)
     v6 <- c(1,1,1,2,1,3,3,3,4,3,1,1,1,2,1,6,5,4)
     m1 <- cbind(v1,v2,v3,v4,v5,v6)
     cor(m1)
     factanal(m1, factors=3) # varimax is the default
     factanal(m1, factors=3, rotation="promax")
     # The following shows the g factor as PC1
     prcomp(m1)

     ## formula interface
     factanal(~v1+v2+v3+v4+v5+v6, factors = 3,
              scores = "Bartlett")$scores

     ## a realistic example from Bartholomew (1987, pp. 61-65)
     utils::example(ability.cov)

