magic                  package:mgcv                  R Documentation

_S_t_a_b_l_e _M_u_l_t_i_p_l_e _S_m_o_o_t_h_i_n_g _P_a_r_a_m_e_t_e_r _E_s_t_i_m_a_t_i_o_n _b_y _G_C_V _o_r _U_B_R_E, 
_w_i_t_h _o_p_t_i_o_n_a_l _f_i_x_e_d _p_e_n_a_l_t_y

_D_e_s_c_r_i_p_t_i_o_n:

     Function to efficiently estimate smoothing parameters in
     generalized ridge regression problems with multiple (quadratic)
     penalties, by GCV  or UBRE. The function uses Newton's method in
     multi-dimensions, backed up by  steepest descent to iteratively
     adjust the smoothing parameters for each penalty  (one penalty may
     have a smoothing parameter fixed at 1). 

     For maximal numerical stability the method is based on orthogonal
     decomposition methods,  and attempts to deal with numerical rank
     deficiency gracefully using a truncated singular  value
     decomposition approach.

_U_s_a_g_e:

     magic(y,X,sp,S,off,L=NULL,lsp0=NULL,rank=NULL,H=NULL,C=NULL,w=NULL,gamma=1,
           scale=1,gcv=TRUE,ridge.parameter=NULL,
           control=list(maxit=50,tol=1e-6,step.half=25,
           rank.tol=.Machine$double.eps^0.5),extra.rss=0,n.score=length(y))

_A_r_g_u_m_e_n_t_s:

       y: is the response data vector.

       X: is the model matrix.

      sp: is the array of smoothing parameters. The vector 'L%*%log(sp)
          + lsp0'  contains the logs of the smoothing parameters that
          actually multiply the penalty matrices stored in  'S' ('L' is
          taken as the identity if 'NULL').  Any 'sp' values that are
          negative are autoinitialized, otherwise they are taken as
          supplying  starting values. A supplied starting value will be
          reset to a default starting value if the gradient of the
          GCV/UBRE score is too small at the supplied value.  

       S: is a list of of penalty matrices. 'S[[i]]' is the ith penalty
          matrix, but note that it is not stored as a full matrix, but
          rather as the smallest square matrix including all  the
          non-zero elements of the penalty matrix. Element 1,1 of
          'S[[i]]'  occupies  element 'off[i]', 'off[i]' of the ith
          penalty matrix. Each 'S[[i]]' must be  positive
          semi-definite. Set to 'list()' if there are no smoothing
          parameters to be estimated. 

     off: is an array indicating the first parameter in the parameter
          vector that is  penalized by the penalty involving 'S[[i]]'.

       L: is a matrix mapping 'log(sp)' to the log smoothing parameters
          that actually multiply the penalties defined by the elemts of
          'S'. Taken as the identity, if 'NULL'. See above under 'sp'.

    lsp0: If 'L' is not 'NULL' this is a vector of constants in the
          linear transformation from 'log(sp)' to the actual log
          smoothing parameters. So the logs of the smoothing parameters
          multiplying the 'S[[i]]' are given by 'L%*%log(sp) + lsp0'.
          Taken as 0 if 'NULL'.

    rank: is an array specifying the ranks of the penalties. This is
          useful, but not  essential, for forming square roots of the
          penalty matrices.

       H: is the optional offset penalty - i.e. a penalty with a
          smoothing parameter fixed at  1. This is useful for allowing
          regularization of the estimation process, fixed smoothing 
          penalties etc.

       C: is the optional matrix specifying any linear equality
          constraints on the fitting  problem. If b is the parameter
          vector then the parameters are forced to satisfy  Cb=0. 

       w: the regression weights. If this is a matrix then it is taken
          as being the  square root of the inverse of the covariance
          matrix of 'y', specifically  V_y^{-1}=w'w. If 'w' is an array
          then  it is taken as the diagonal of this matrix, or simply
          the weight for each element of  'y'. See below for an example
          using this.

   gamma: is an inflation factor for the model degrees of freedom in
          the GCV or UBRE  score.

   scale: is the scale parameter for use with UBRE.

     gcv: should be set to 'TRUE' if GCV is to be used, 'FALSE' for
          UBRE.

ridge.parameter: It is sometimes useful to apply a ridge penalty to the
          fitting problem,  penalizing the parameters in the
          constrained space directly. Setting this parameter to a value
           greater than zero will cause such a penalty to be used, with
          the magnitude given by the  parameter value.

 control: is a list of iteration control constants with the following
          elements:

     _m_a_x_i_t The maximum number of iterations of the magic algorithm to
          allow.

     _t_o_l The tolerance to use in judging convergence.

     _s_t_e_p._h_a_l_f If a trial step fails then the method tries halving it
          up to a maximum of  'step.half' times.

     _r_a_n_k._t_o_l is a constant used to test for numerical rank deficiency
          of the problem.  Basically any singular value less than
          'rank_tol' multiplied by the largest singular value of  the 
          problem is set to zero.

extra.rss: is a constant to be added to the residual sum of squares
          (squared norm) term in the calculation of the GCV, UBRE and
          scale parameter estimate. In conjuction with 'n.score', this
          is useful for certain methods for dealing with very large
          data sets.

 n.score: number to use as the number of data in GCV/UBRE score
          calculation: usually the actual number of data, but there are
          methods  for dealing with very large datasets that change
          this.

_D_e_t_a_i_l_s:

     The method is a computationally efficient means of applying GCV or
     UBRE (often approximately  AIC) to the  problem of smoothing
     parameter selection in generalized ridge regression problems  of
     the form:

 minimise || W (Xb-y) ||^2 + b'Hb + theta_1 b'S_1 b + theta_2 b'S_2 b + . . .

     possibly subject to constraints Cb=0.  X is a design matrix, b a
     parameter vector,  y a data vector, W a weight matrix, S_i a
     positive semi-definite matrix  of coefficients defining the ith
     penalty with associated smoothing parameter theta_i,  H is the
     positive semi-definite offset penalty matrix  and C a  matrix of
     coefficients defining any linear equality constraints on the
     problem.  X need not be of full column rank.

     The theta_i are chosen to minimize either the GCV score:


                V_g = n ||W(y-Ay)||^2/[tr(I - g A)]^2


     or the UBRE score:


             V_u =||W(y-Ay||^2/n - 2 s tr(I - g A)/n + s


     where g is 'gamma' the inflation factor for degrees of freedom
     (usually set to 1) and s  is 'scale', the scale parameter. A is
     the hat matrix (influence matrix) for the fitting problem (i.e the
     matrix mapping data to fitted values). Dependence of the scores on
     the smoothing parameters is through A. 

     The method operates by  Newton or steepest descent updates of the
     logs of the  theta_i. A key aspect of the method is stable and
     economical calculation of the  first and second derivatives of the
     scores w.r.t. the log smoothing parameters.  Because the GCV/UBRE
     scores are flat w.r.t. very large or very small theta_i,  it's
     important to get good starting parameters, and to be careful not
     to step into a flat region of the smoothing parameter space. For
     this reason the algorithm rescales any Newton step that  would
     result in a log(theta_i) change of more than 5. Newton steps are 
     only used if the Hessian of the GCV/UBRE is postive definite,
     otherwise steepest descent is  used. Similarly steepest descent is
     used if the Newton step has to be contracted too far  (indicating
     that the quadratic model underlying Newton is poor). All initial
     steepest descent  steps are scaled so that their largest component
     is 1. However a step is calculated,  it is never expanded if it is
     successful (to avoid flat portions of the objective),  but steps
     are successively halved if they do not decrease the GCV/UBRE
     score, until  they do, or the direction is deemed to have failed.
     (Given the smoothing parameters the optimal  b parameters are
     easily found.)

     The method is coded in 'C' with matrix factorizations performed
     using LINPACK and LAPACK routines.

_V_a_l_u_e:

     The function returns a list with the following items:

       b: The best fit parameters given the estimated smoothing
          parameters.

   scale: the estimated (GCV) or supplied (UBRE) scale parameter.

   score: the minimized GCV or UBRE score.

      sp: an array of the estimated smoothing parameters.

 sp.full: an array of the smoothing parameters that actually multiply
          the elements of 'S' (same as 'sp' if 'L' was 'NULL'). This is
          'exp(L%*%log(sp))'.

      rV: a factored form of the parameter covariance matrix. The
          (Bayesian)  covariance matrix of the parametes 'b' is given
          by 'rV%*%t(rV)*scale'. 

gcv.info: is a list of information about the performance of the method
          with the following elements:

        _f_u_l_l._r_a_n_k The apparent rank of the problem: number of
             parameters less number of equality constraints.

        _r_a_n_k The estimated actual rank of the problem (at the final
             iteration of the method).

        _f_u_l_l_y._c_o_n_v_e_r_g_e_d is 'TRUE' if the method converged by satisfying
             the convergence criteria, and 'FALSE' if it coverged  by
             failing to decrease the score along the search direction.

        _h_e_s_s._p_o_s._d_e_f is 'TRUE' if the hessian of the UBRE or GCV score
             was positive definite at convergence.

        _i_t_e_r is the number of Newton/Steepest descent iterations taken.

        _s_c_o_r_e._c_a_l_l_s is the number of times that the GCV/UBRE score had
             to be evaluated.

        _r_m_s._g_r_a_d is the root mean square of the gradient of the
             UBRE/GCV score w.r.t. the smoothing parameters.

     Note that some further useful quantities can be obtained using
     'magic.post.proc'.

_A_u_t_h_o_r(_s):

     Simon N. Wood simon.wood@r-project.org

_R_e_f_e_r_e_n_c_e_s:

     Wood, S.N. (2004) Stable and efficient multiple smoothing
     parameter estimation for generalized additive models. J. Amer.
     Statist. Ass. 99:673-686

     <URL: http://www.maths.bath.ac.uk/~sw283/>

_S_e_e _A_l_s_o:

     'magic.post.proc', 'mgcv','gam'

_E_x_a_m_p_l_e_s:

     ## Use `magic' for a standard additive model fit ... 
        library(mgcv)
        set.seed(1);n <- 400;sig <- 2
        dat <- gamSim(1,n=n,scale=sig)
     ## set up additive model
        G <- gam(y~s(x0)+s(x1)+s(x2)+s(x3),fit=FALSE,data=dat)
     ## fit using magic
        mgfit <- magic(G$y,G$X,G$sp,G$S,G$off,rank=G$rank,C=G$C)
     ## and fit using gam as consistency check
        b <- gam(G=G)
        mgfit$sp;b$sp  # compare smoothing parameter estimates
        edf <- magic.post.proc(G$X,mgfit,G$w)$edf  # extract e.d.f. per parameter
        range(edf-b$edf)  # compare

     ## constrain first two smooths to have identical smoothing parameters
        L <- diag(3);L <- rbind(L[1,],L)
        mgfit <- magic(G$y,G$X,rep(-1,3),G$S,G$off,L=L,rank=G$rank,C=G$C)

     ## Now a correlated data example ... 
         library(nlme)
     ## simulate truth
         set.seed(1);n<-400;sig<-2
         x <- 0:(n-1)/(n-1)
         f <- 0.2*x^11*(10*(1-x))^6+10*(10*x)^3*(1-x)^10
     ## produce scaled covariance matrix for AR1 errors...
         V <- corMatrix(Initialize(corAR1(.6),data.frame(x=x)))
         Cv <- chol(V)  # t(Cv)
     ## Simulate AR1 errors ...
         e <- t(Cv)%*%rnorm(n,0,sig) # so cov(e) = V * sig^2
     ## Observe truth + AR1 errors
         y <- f + e 
     ## GAM ignoring correlation
         par(mfrow=c(1,2))
         b <- gam(y~s(x,k=20))
         plot(b);lines(x,f-mean(f),col=2);title("Ignoring correlation")
     ## Fit smooth, taking account of *known* correlation...
         w <- solve(t(Cv)) # V^{-1} = w'w
         ## Use `gam' to set up model for fitting...
         G <- gam(y~s(x,k=20),fit=FALSE)
         ## fit using magic, with weight *matrix*
         mgfit <- magic(G$y,G$X,G$sp,G$S,G$off,rank=G$rank,C=G$C,w=w)
     ## Modify previous gam object using new fit, for plotting...    
         mg.stuff <- magic.post.proc(G$X,mgfit,w)
         b$edf <- mg.stuff$edf;b$Vp <- mg.stuff$Vb
         b$coefficients <- mgfit$b 
         plot(b);lines(x,f-mean(f),col=2);title("Known correlation")

