empinf                 package:boot                 R Documentation

_E_m_p_i_r_i_c_a_l _I_n_f_l_u_e_n_c_e _V_a_l_u_e_s

_D_e_s_c_r_i_p_t_i_o_n:

     This function calculates the empirical influence values for a
     statistic applied to a data set.  It allows four types of
     calculation, namely the infinitesimal jackknife (using numerical
     differentiation), the usual jackknife estimates, the 'positive'
     jackknife estimates and a method which estimates the empirical
     influence values using regression of bootstrap replicates of the
     statistic.  All methods can be used with one or more samples.

_U_s_a_g_e:

     empinf(boot.out = NULL, data = NULL, statistic = NULL,
            type = NULL, stype = NULL ,index = 1, t = NULL,
            strata = rep(1, n), eps = 0.001, ...)

_A_r_g_u_m_e_n_t_s:

boot.out: A bootstrap object created by the function 'boot'.  If 'type'
          is '"reg"' then this argument is required.  For any of the
          other types it is an optional argument.  If it is included
          when optional then the values of 'data', 'statistic',
          'stype', and 'strata' are taken from the components of
          'boot.out' and any values passed to 'empinf' directly are
          ignored. 

    data: A vector, matrix or data frame containing the data for which
          empirical influence values are required.  It is a required
          argument if 'boot.out' is not supplied.  If 'boot.out' is
          supplied then 'data' is set to 'boot.out$data' and any value
          supplied is ignored. 

statistic: The statistic for which empirical influence values are
          required.  It must be a function of at least two arguments,
          the data set and a vector of weights, frequencies or indices.
           The nature of the second argument is given by the value of
          'stype'.  Any other arguments that it takes must be supplied
          to 'empinf' and will be passed to 'statistic' unchanged. This
          is a required argument if 'boot.out' is not supplied,
          otherwise its value is taken from 'boot.out' and any value
          supplied here will be ignored. 

    type: The calculation type to be used for the empirical influence
          values. Possible values of 'type' are '"inf"' (infinitesimal
          jackknife), '"jack"' (usual jackknife), '"pos"' (positive
          jackknife), and '"reg"' (regression estimation).  The default
          value depends on the other arguments.  If 't' is supplied
          then the default value of 'type' is '"reg"' and 'boot.out'
          should be present so that its frequency array can be found. 
          It 't' is not supplied then if 'stype' is '"w"', the default
          value of 'type' is '"inf"'; otherwise, if 'boot.out' is
          present the default is '"reg"'.  If none of these conditions
          apply then the default is '"jack"'.  Note that it is an error
          for 'type' to be '"reg"' if 'boot.out' is missing or to be 
          '"inf"' if 'stype' is not '"w"'. 

   stype: A character variable giving the nature of the second argument
          to 'statistic'. It can take on three values: '"w"' (weights),
          '"f"' (frequencies), or '"i"' (indices).  If 'boot.out' is
          supplied the value of 'stype' is set to 'boot.out$stype' and
          any value supplied here is ignored. Otherwise it is an
          optional argument which defaults to '"w"'. If 'type' is
          '"inf"' then 'stype' MUST be '"w"'. 

   index: An integer giving the position of the variable of interest in
          the output of 'statistic'. 

       t: A vector of length 'boot.out$R' which gives the bootstrap
          replicates of the statistic of interest.  't' is used only
          when 'type' is 'reg' and it defaults to 'boot.out$t[,index]'. 

  strata: An integer vector or a factor specifying the strata for
          multi-sample problems. If 'boot.out' is supplied  the value
          of 'strata' is set to 'boot.out$strata'. Otherwise it is an
          optional argument which has default corresponding to the
          single sample situation. 

     eps: This argument is used only if 'type' is '"inf"'.  In that
          case the value of epsilon to be used for numerical
          differentiation will be 'eps' divided by the number of
          observations in 'data'. 

     ...: Any other arguments that 'statistic' takes.  They will be
          passed unchanged to 'statistic' every time that it is called. 

_D_e_t_a_i_l_s:

     If 'type' is '"inf"' then numerical differentiation is used to
     approximate the empirical influence values.  This makes sense only
     for statistics which are written in weighted form (i.e. 'stype' is
     '"w"').  If 'type' is '"jack"' then the usual leave-one-out
     jackknife estimates of the empirical influence are returned.  If
     'type' is '"pos"' then the positive (include-one-twice) jackknife
     values are used.  If 'type' is '"reg"' then a bootstrap object
     must be supplied. The regression method then works by regressing
     the bootstrap replicates of 'statistic' on the frequency array
     from which they were derived. The bootstrap frequency array is
     obtained through a call to 'boot.array'.  Further details of the
     methods are given in Section 2.7 of Davison and Hinkley (1997).

     Empirical influence values are often used frequently in
     nonparametric bootstrap applications.  For this reason many other
     functions call 'empinf' when they are required.  Some examples of
     their use are for nonparametric delta estimates of variance, BCa
     intervals and finding linear approximations to statistics for use
     as control variates.  They are also used for antithetic bootstrap
     resampling.

_V_a_l_u_e:

     A vector of the empirical influence values of 'statistic' applied
     to 'data'.  The values will be in the same order as the
     observations in data.

_W_a_r_n_i_n_g:

     All arguments to 'empinf' must be passed using the 'name = value'
     convention.  If this is not followed then unpredictable errors can
     occur.

_R_e_f_e_r_e_n_c_e_s:

     Davison, A.C. and Hinkley, D.V. (1997) _Bootstrap Methods and
     Their Application_. Cambridge University Press.

     Efron, B. (1982) _The Jackknife, the Bootstrap and Other
     Resampling Plans_. CBMS-NSF Regional Conference Series in Applied
     Mathematics, *38*, SIAM.

     Fernholtz, L.T. (1983) _von Mises Calculus for Statistical
     Functionals_. Lecture Notes in Statistics, *19*, Springer-Verlag.

_S_e_e _A_l_s_o:

     'boot', 'boot.array', 'boot.ci', 'control', 'jack.after.boot',
     'linear.approx', 'var.linear'

_E_x_a_m_p_l_e_s:

     # The empirical influence values for the ratio of means in
     # the city data.
     ratio <- function(d, w) sum(d$x *w)/sum(d$u*w)
     empinf(data=city,statistic=ratio)
     city.boot <- boot(city,ratio,499,stype="w")
     empinf(boot.out=city.boot,type="reg")

     # A statistic that may be of interest in the difference of means
     # problem is the t-statistic for testing equality of means.  In
     # the bootstrap we get replicates of the difference of means and
     # the variance of that statistic and then want to use this output
     # to get the empirical influence values of the t-statistic.
     grav1 <- gravity[as.numeric(gravity[,2])>=7,]
     grav.fun <- function(dat, w)
     {    strata <- tapply(dat[, 2], as.numeric(dat[, 2]))
          d <- dat[, 1]
          ns <- tabulate(strata)
          w <- w/tapply(w, strata, sum)[strata]
          mns <- tapply(d * w, strata, sum)
          mn2 <- tapply(d * d * w, strata, sum)
          s2hat <- sum((mn2 - mns^2)/ns)
          c(mns[2]-mns[1],s2hat)
     }

     grav.boot <- boot(grav1, grav.fun, R=499, stype="w", strata=grav1[,2])

     # Since the statistic of interest is a function of the bootstrap
     # statistics, we must calculate the bootstrap replicates and pass
     # them to empinf using the t argument.
     grav.z <- (grav.boot$t[,1]-grav.boot$t0[1])/sqrt(grav.boot$t[,2])
     empinf(boot.out=grav.boot,t=grav.z)

