tsboot                 package:boot                 R Documentation

_B_o_o_t_s_t_r_a_p_p_i_n_g _o_f _T_i_m_e _S_e_r_i_e_s

_D_e_s_c_r_i_p_t_i_o_n:

     Generate 'R' bootstrap replicates of a statistic applied to a time
     series.  The replicate time series can be generated using fixed or
     random block lengths or can be model based replicates.

_U_s_a_g_e:

     tsboot(tseries, statistic, R, l=NULL, sim="model", endcorr=TRUE, 
            n.sim=NROW(tseries), orig.t=TRUE, ran.gen, 
            ran.args=NULL, norm=TRUE, ...)

_A_r_g_u_m_e_n_t_s:

 tseries: A univariate or multivariate time series. 

statistic: A function which when applied to 'tseries' returns a vector
          containing the statistic(s) of interest.  Each time
          'statistic' is called it is passed a time series of length
          'n.sim' which is of the same class as the original 'tseries'.
           Any other arguments which 'statistic' takes must remain
          constant for each bootstrap replicate and should be supplied
          through the ...{} argument to 'tsboot'. 

       R: A positive integer giving the number of bootstrap replicates
          required.   

     sim: The type of simulation required to generate the replicate
          time series.  The possible input values are '"model"' (model
          based resampling), '"fixed"' (block resampling with fixed
          block lengths of 'l'), '"geom"' (block resampling with block
          lengths having a geometric distribution with mean 'l') or
          '"scramble"' (phase scrambling).   

       l: If 'sim' is '"fixed"' then 'l' is the fixed block length used
          in generating the replicate time series.  If 'sim' is
          '"geom"' then 'l' is the mean of the geometric distribution
          used to generate the block lengths. 'l' should be a positive
          integer less than the length of 'tseries'.  This argument is
          not required when 'sim' is '"model"' but it is required for
          all other simulation types. 

 endcorr: A logical variable indicating whether end corrections are to
          be applied when 'sim' is '"fixed"'.  When 'sim' is '"geom"',
          'endcorr' is automatically set to 'TRUE'; 'endcorr' is not
          used when 'sim' is '"model"' or '"scramble"'. 

   n.sim: The length of the simulated time series.  Typically this will
          be equal to the length of the original time series but there
          are situations when it will be larger.  One obvious situation
          is if prediction is required. Another situation in which
          'n.sim' is larger than the original length is if 'tseries' is
          a residual time series from fitting some model to the
          original time series. In this case, 'n.sim' would usually be
          the length of the original time series. 

  orig.t: A logical variable which indicates whether 'statistic' should
          be applied to 'tseries' itself as well as the bootstrap
          replicate series.  If 'statistic' is expecting a longer time
          series than 'tseries' or if applying 'statistic' to 'tseries'
          will not yield any useful information then 'orig.t' should be
          set to 'FALSE'. 

 ran.gen: This is a function of three arguments.  The first argument is
          a time series.  If 'sim' is code{"model"} then it will always
          be 'tseries' that is passed.  For other simulation types it
          is the result of selecting 'n.sim' observations from
          'tseries' by some scheme and converting the result back into
          a time series of the same form as 'tseries' (although of
          length 'n.sim').  The second argument to 'ran.gen' is always
          the value 'n.sim', and the third argument is 'ran.args',
          which is used to supply any other objects needed by
          'ran.gen'.  If 'sim' is '"model"' then the generation of the
          replicate time series will be done in 'ran.gen' (for example
          through use of 'arima.sim'). For the other simulation types
          'ran.gen' is used for "post-blackening".  The default is that
          the function simply returns the time series passed to it. 

ran.args: This will be supplied to 'ran.gen' each time it is called. 
          If 'ran.gen' needs  any extra arguments then they should be
          supplied as components of 'ran.args'.   Multiple arguments
          may be passed by making 'ran.args' a list.  If 'ran.args'  is
          'NULL' then it should not be used within 'ran.gen' but note
          that 'ran.gen'  must still have its third argument.   

    norm: A logical argument indicating whether normal margins should
          be used for phase scrambling.  If 'norm' is 'FALSE' then
          margins corresponding to the exact empirical margins are
          used. 

     ...: Any extra arguments to 'statistic' may be supplied here. 

_D_e_t_a_i_l_s:

     If 'sim' is '"fixed"' then each replicate time series is found by
     taking  blocks of length 'l', from the original time series and
     putting them  end-to-end until a new series of length 'n.sim' is
     created.  When 'sim' is  '"geom"' a similar approach is taken
     except that now the block lengths are  generated from a geometric
     distribution with mean 'l'.   Post-blackening can  be carried out
     on these replicate time series by including the function 
     'ran.gen' in the call to 'tsboot' and having 'tseries' as a time
     series of  residuals.  

     Model based resampling is very similar to the parametric bootstrap
     and all simulation must be in one of the user specified functions.
      This  avoids the complicated problem of choosing the block length
     but relies on an  accurate model choice being made.

     Phase scrambling is described in Section 8.2.4 of Davison and
     Hinkley (1997). The types of statistic for which this method
     produces reasonable results is very limited and the other methods
     seem to do better in most situations. Other types of resampling in
     the frequency domain can be accomplished using the function 'boot'
     with the argument 'sim="parametric"'.

_V_a_l_u_e:

     An object of class '"boot"' with the following components.

      t0: If 'orig.t' is 'TRUE' then 't0' is the result of
          'statistic(tseries,...{})'  otherwise it is 'NULL'. 

       t: The results of applying 'statistic' to the replicate time
          series.  

       R: The value of 'R' as supplied to 'tsboot'. 

 tseries: The original time series. 

statistic: The function 'statistic' as supplied. 

     sim: The simulation type used in generating the replicates. 

 endcorr: The value of 'endcorr' used.  The value is meaningful only
          when 'sim' is '"fixed"'; it is ignored for model based
          simulation or phase scrambling and is always set to 'TRUE' if
          'sim' is '"geom"'. 

   n.sim: The value of 'n.sim' used. 

       l: The value of 'l' used for block based resampling.  This will
          be 'NULL' if  block based resampling was not used. 

 ran.gen: The 'ran.gen' function used for generating the series or for
          "post-blackening". 

ran.args: The extra arguments passed to 'ran.gen'. 

    call: The original call to 'tsboot'. 

_R_e_f_e_r_e_n_c_e_s:

     Davison, A.C. and Hinkley, D.V. (1997)  _Bootstrap Methods and
     Their Application_. Cambridge University Press.

     Kunsch, H.R. (1989) The jackknife and the bootstrap for general
     stationary observations. _Annals of Statistics_, *17*, 1217-1241.

     Politis, D.N. and Romano, J.P. (1994) The stationary bootstrap. 
     _Journal of the American Statistical Association_, *89*,
     1303-1313.

_S_e_e _A_l_s_o:

     'boot', 'arima.sim'

_E_x_a_m_p_l_e_s:

     lynx.fun <- function(tsb) 
     {    ar.fit <- ar(tsb, order.max=25)
          c(ar.fit$order, mean(tsb), tsb)
     }

     # the stationary bootstrap with mean block length 20
     lynx.1 <- tsboot(log(lynx), lynx.fun, R=99, l=20, sim="geom")

     # the fixed block bootstrap with length 20
     lynx.2 <- tsboot(log(lynx), lynx.fun, R=99, l=20, sim="fixed")

     # Now for model based resampling we need the original model
     # Note that for all of the bootstraps which use the residuals as their
     # data, we set orig.t to FALSE since the function applied to the residual
     # time series will be meaningless.
     lynx.ar <- ar(log(lynx))
     lynx.model <- list(order=c(lynx.ar$order,0,0),ar=lynx.ar$ar)
     lynx.res <- lynx.ar$resid[!is.na(lynx.ar$resid)]
     lynx.res <- lynx.res - mean(lynx.res)

     lynx.sim <- function(res,n.sim, ran.args) {
     # random generation of replicate series using arima.sim 
          rg1 <- function(n, res)
               sample(res, n, replace=TRUE)
          ts.orig <- ran.args$ts
          ts.mod <- ran.args$model
          mean(ts.orig)+ts(arima.sim(model=ts.mod, n=n.sim,
                           rand.gen=rg1, res=as.vector(res)))
     }

     lynx.3 <- tsboot(lynx.res, lynx.fun, R=99, sim="model", n.sim=114,
                      orig.t=FALSE, ran.gen=lynx.sim, 
                      ran.args=list(ts=log(lynx), model=lynx.model))

     #  For "post-blackening" we need to define another function
     lynx.black <- function(res, n.sim, ran.args) 
     {    ts.orig <- ran.args$ts
          ts.mod <- ran.args$model
          mean(ts.orig) + ts(arima.sim(model=ts.mod,n=n.sim,innov=res))
     }

     # Now we can run apply the two types of block resampling again but this
     # time applying post-blackening.
     lynx.1b <- tsboot(lynx.res, lynx.fun, R=99, l=20, sim="fixed",
                       n.sim=114, orig.t=FALSE, ran.gen=lynx.black, 
                       ran.args=list(ts=log(lynx), model=lynx.model))

     lynx.2b <- tsboot(lynx.res, lynx.fun, R=99, l=20, sim="geom",
                       n.sim=114, orig.t=FALSE, ran.gen=lynx.black, 
                       ran.args=list(ts=log(lynx), model=lynx.model))

     # To compare the observed order of the bootstrap replicates we
     # proceed as follows.
     table(lynx.1$t[,1])
     table(lynx.1b$t[,1])
     table(lynx.2$t[,1])
     table(lynx.2b$t[,1])
     table(lynx.3$t[,1])
     # Notice that the post-blackened and model-based bootstraps preserve
     # the true order of the model (11) in many more cases than the others.

