mgcv                  package:mgcv                  R Documentation

_M_u_l_t_i_p_l_e _S_m_o_o_t_h_i_n_g _P_a_r_a_m_e_t_e_r _E_s_t_i_m_a_t_i_o_n _b_y _G_C_V _o_r _U_B_R_E

_D_e_s_c_r_i_p_t_i_o_n:

     Function to efficiently estimate smoothing parameters in
     Generalized Ridge Regression Problem with multiple (quadratic)
     penalties, by GCV  or UBRE. The function uses Newton's method in
     multi-dimensions, backed up by steepest descent to iteratively 
     adjust a set of relative smoothing parameters for each penalty. To
     ensure that the overall level of smoothing is optimal, and to
     guard against trapping by local minima, a highly efficient global
     minimisation with respect to  one overall smoothing parameter is
     also made at each iteration.

     For a listing of all routines in the 'mgcv' package type:
      'library(help="mgcv")'

_U_s_a_g_e:

     mgcv(y,X,sp,S,off,C=NULL,w=rep(1,length(y)),H=NULL,
          scale=1,gcv=TRUE,control=mgcv.control())

_A_r_g_u_m_e_n_t_s:

       y: The response data vector.

       X: The design matrix for the problem, note that 'ncol(X)' must
          give the number of model parameters, while 'nrow(X)'  should
          give the number of data.

      sp: An array of smoothing parameters. If 'control$fixed==TRUE'
          then these are taken as being the  smoothing parameters.
          Otherwise any positive values are assumed to be initial
          estimates and negative values to signal auto-initialization.

       S: A list of penalty matrices. Only the smallest square block
          containing all non-zero matrix elements is actually stored,
          and 'off[i]' indicates the element of the parameter vector
          that  'S[[i]][1,1]' relates to.

     off: Offset values indicating where in the overall parameter a
          particular stored penalty starts operating.  For example if
          'p' is the model parameter vector and 'k=nrow(S[[i]])-1',
          then the ith penalty is given by 
           't(p[off[i]:(off[i]+k)])%*%S[[i]]%*%p[off[i]:(off[i]+k)]'.

       C: Matrix containing any linear equality constraints  on the
          problem (i.e. C in Cp=0).

       w: A vector of weights for the data (often proportional to the 
          reciprocal of the standard deviation of 'y'). 

       H: A single fixed penalty matrix to be used in place of the
          multiple  penalty matrices in 'S'. 'mgcv' cannot mix fixed
          and estimated penalties.

   scale: This is the known scale parameter/error variance to use with
          UBRE.  Note that it is assumed that the variance of y_i is 
          given by 'scale'/w_i.

     gcv: If 'gcv' is TRUE then smoothing parameters are estimated by
          GCV, otherwise UBRE is used.

 control: A list of control options returned by 'mgcv.control'.

_D_e_t_a_i_l_s:

     This is documentation for the code implementing the method
     described in section  4 of  Wood (2000) . The method is a
     computationally efficient means of applying GCV to  the  problem
     of smoothing parameter selection in generalized ridge regression
     problems  of  the form:

 minimise || W (Xp-y) ||^2 rho +  lambda_1 p'S_1 p + lambda_1 p'S_2 p + . . .

     possibly subject to constraints Cp=0.  X is a design matrix, p a
     parameter vector,  y a data vector, W a diagonal weight matrix,
     S_i a positive semi-definite matrix  of coefficients defining the
     ith penalty and C a matrix of coefficients  defining any linear
     equality constraints on the problem. The smoothing parameters are
     the lambda_i but there is an overall smoothing parameter rho as
     well. Note that X must be of full column rank, at least when
     projected  into the null space of any equality constraints.  

     The method operates by alternating very efficient direct searches
     for  rho with Newton or steepest descent updates of the logs of
     the lambda_i.  Because the GCV/UBRE scores are flat w.r.t. very
     large or very small lambda_i,  it's important to get good starting
     parameters, and to be careful not to step into a flat region of
     the smoothing parameter space. For this reason the algorithm
     rescales any Newton step that  would result in a log(lambda_i)
     change of more than 5. Newton steps are only used if the Hessian
     of the GCV/UBRE is postive definite, otherwise steepest descent is
     used. Similarly steepest  descent is used if the Newton step has
     to be contracted too far (indicating that the quadratic model 
     underlying Newton is poor). All initial steepest descent steps are
     scaled so that their largest component is 1. However a step is
     calculated, it is never expanded if it is successful (to avoid
     flat portions of the objective),  but steps are successively
     halved if they do not decrease the GCV/UBRE score, until they do,
     or the direction is deemed to have  failed. 'M$conv' provides some
     convergence diagnostics.

     The method is coded in 'C' and is intended to be portable. It
     should be  noted that seriously ill conditioned problems (i.e.
     with close to column rank  deficiency in the design matrix) may
     cause problems, especially if weights vary  wildly between
     observations.

_V_a_l_u_e:

     An object is returned with the following elements:

       b: The best fit parameters given the estimated smoothing
          parameters.

   scale: The estimated or supplied scale parameter/error variance.

   score: The UBRE or GCV score.

      sp: The estimated (or supplied) smoothing parameters
          (lambda_i/rho)

      Vb: Estimated covariance matrix of model parameters.

     hat: diagonal of the hat/influence matrix.

     edf: array of estimated degrees of freedom for each parameter.

    info: A list of convergence diagnostics, with the following
          elements:

        _e_d_f Array of whole model estimated degrees of freedom.

        _s_c_o_r_e Array of ubre/gcv scores at the edfs for the final set of
             relative smoothing parameters.

        _g the gradient of the GCV/UBRE score w.r.t. the smoothing
             parameters at termination.

        _h the second derivatives corresponding to 'g' above - i.e. the
             leading diagonal of the Hessian.

        _e the eigenvalues of the Hessian. These should all be
             non-negative!

        _i_t_e_r the number of iterations taken.

        _i_n._o_k 'TRUE' if the second smoothing parameter guess improved
             the GCV/UBRE score. (Please report examples  where this is
             'FALSE')

        _s_t_e_p._f_a_i_l 'TRUE' if the algorithm terminated by failing to
             improve the GCV/UBRE score rather than by "converging". 
             Not necessarily a problem, but check the above derivative
             information quite carefully.

_W_A_R_N_I_N_G:

     The method may not behave well with near column rank deficient X
     especially in contexts where the weights vary wildly.

_A_u_t_h_o_r(_s):

     Simon N. Wood simon.wood@r-project.org

_R_e_f_e_r_e_n_c_e_s:

     Gu and Wahba (1991) Minimizing GCV/GML scores with multiple
     smoothing parameters via the Newton method. SIAM J. Sci. Statist.
     Comput. 12:383-398

     Wood, S.N. (2000)  Modelling and Smoothing Parameter Estimation
     with Multiple  Quadratic Penalties. J.R.Statist.Soc.B
     62(2):413-428

     <URL: http://www.stats.gla.ac.uk/~simon/>

_S_e_e _A_l_s_o:

     'gam', 'magic'

_E_x_a_m_p_l_e_s:

     library(help="mgcv") # listing of all routines

     set.seed(1);n<-400;sig2<-4
     x0 <- runif(n, 0, 1);x1 <- runif(n, 0, 1)
     x2 <- runif(n, 0, 1);x3 <- runif(n, 0, 1)
     f <- 2 * sin(pi * x0)
     f <- f + exp(2 * x1) - 3.75887
     f <- f+0.2*x2^11*(10*(1-x2))^6+10*(10*x2)^3*(1-x2)^10-1.396
     e <- rnorm(n, 0, sqrt(sig2))
     y <- f + e
     # set up additive model
     G<-gam(y~s(x0)+s(x1)+s(x2)+s(x3),fit=FALSE)
     # fit using mgcv
     mgfit<-mgcv(G$y,G$X,G$sp,G$S,G$off,C=G$C)
      

