runmed                 package:stats                 R Documentation

_R_u_n_n_i_n_g _M_e_d_i_a_n_s - _R_o_b_u_s_t _S_c_a_t_t_e_r _P_l_o_t _S_m_o_o_t_h_i_n_g

_D_e_s_c_r_i_p_t_i_o_n:

     Compute running medians of odd span.  This is the "most robust"
     scatter plot smoothing possible.  For efficiency (and historical
     reason), you can use one of two different algorithms giving
     identical results.

_U_s_a_g_e:

     runmed(x, k, endrule = c("median","keep","constant"),
            algorithm = NULL, print.level = 0)

_A_r_g_u_m_e_n_t_s:

       x: numeric vector, the "dependent" variable to be smoothed.

       k: integer width of median window; must be odd.  Turlach had a
          default of 'k <- 1 + 2 * min((n-1)%/% 2, ceiling(0.1*n))'.
          Use 'k = 3' for "minimal" robust smoothing eliminating
          isolated outliers.

 endrule: character string indicating how the values at the beginning
          and the end (of the data) should be treated.

          '"_k_e_e_p"' keeps the first and last k2 values at both ends,
               where k2 is the half-bandwidth 'k2 = k %/% 2', i.e.,
               'y[j] = x[j]' for j = 1,..,k2 and (n-k2+1),..,n;

          '"_c_o_n_s_t_a_n_t"' copies 'median(y[1:k2])' to the first values and
               analogously for the last ones making the smoothed ends
               _constant_;

          '"_m_e_d_i_a_n"' the default, smoothes the ends by using
               symmetrical medians of subsequently smaller bandwidth,
               but for the very first and last value where Tukey's
               robust end-point rule is applied, see 'smoothEnds'.

algorithm: character string (partially matching '"Turlach"' or
          '"Stuetzle"') or the default 'NULL', specifying which
          algorithm should be applied.  The default choice depends on
          'n = length(x)' and 'k' where '"Turlach"' will be used for
          larger problems.

print.level: integer, indicating verboseness of algorithm; should
          rarely be changed by average users.

_D_e_t_a_i_l_s:

     Apart from the end values, the result 'y = runmed(x, k)' simply
     has 'y[j] = median(x[(j-k2):(j+k2)])' (k = 2*k2+1), computed very
     efficiently.

     The two algorithms are internally entirely different:

     "_T_u_r_l_a_c_h" is the Hrdle-Steiger algorithm (see Ref.) as
          implemented by Berwin Turlach. A tree algorithm is used,
          ensuring performance O(n * log(k)) where 'n <- length(x)'
          which is asymptotically optimal.

     "_S_t_u_e_t_z_l_e" is the (older) Stuetzle-Friedman implementation which
          makes use of median _updating_ when one observation enters
          and one leaves the smoothing window.  While this performs as
          O(n * k) which is slower asymptotically, it is considerably
          faster for small k or n.

_V_a_l_u_e:

     vector of smoothed values of the same length as 'x' with an
     'attr'ibute 'k' containing (the 'oddified') 'k'.

_A_u_t_h_o_r(_s):

     Martin Maechler maechler@stat.math.ethz.ch, based on Fortran code
     from Werner Stuetzle and S-plus and C code from Berwin Turlach.

_R_e_f_e_r_e_n_c_e_s:

     Hrdle, W. and Steiger, W. (1995) [Algorithm AS 296] Optimal
     median smoothing, _Applied Statistics_ *44*, 258-264.

     Jerome H. Friedman and Werner Stuetzle (1982) _Smoothing of
     Scatterplots_; Report, Dep. Statistics, Stanford U., Project Orion
     003.

     Martin Maechler (2003) Fast Running Medians: Finite Sample and
     Asymptotic Optimality; working paper available from the author.

_S_e_e _A_l_s_o:

     'smoothEnds' which implements Tukey's end point rule and is called
     by default from 'runmed(*, endrule = "median")'. 'smooth' (from
     'eda' package) uses running medians of 3 for its compound
     smoothers.

_E_x_a_m_p_l_e_s:

     example(nhtemp)#> data(nhtemp)
     myNHT <- as.vector(nhtemp)
     myNHT[20] <- 2 * nhtemp[20]
     plot(myNHT, type="b", ylim = c(48,60), main = "Running Medians Example")
     lines(runmed(myNHT, 7), col = "red")

     ## special: multiple y values for one x
     data(cars)
     plot(cars, main = "'cars' data and runmed(dist, 3)")
     lines(cars, col = "light gray", type = "c")
     with(cars, lines(speed, runmed(dist, k = 3), col = 2))

     ## nice quadratic with a few outliers
     y <- ys <- (-20:20)^2
     y [c(1,10,21,41)] <- c(150, 30, 400, 450)
     all(y == runmed(y, 1)) # 1-neigborhood <==> interpolation
     plot(y) ## lines(y, lwd=.1, col="light gray")
     lines(lowess(seq(y),y, f = .3), col = "brown")
     lines(runmed(y, 7), lwd=2, col = "blue")
     lines(runmed(y,11), lwd=2, col = "red")

     ## Lowess is not robust
     y <- ys ; y[21] <- 6666 ; x <- seq(y)
     col <- c("black", "brown","blue")
     plot(y, col=col[1])
     lines(lowess(x,y, f = .3), col = col[2])
     lines(runmed(y, 7),      lwd=2, col = col[3])
     legend(length(y),max(y), c("data", "lowess(y, f = 0.3)", "runmed(y, 7)"),
            xjust = 1, col = col, lty = c(0, 1,1), pch = c(1,NA,NA))

