cdplot               package:graphics               R Documentation

_C_o_n_d_i_t_i_o_n_a_l _D_e_n_s_i_t_y _P_l_o_t_s

_D_e_s_c_r_i_p_t_i_o_n:

     Computes and plots conditional densities describing how the
     conditional distribution of a categorical variable 'y' changes
     over a numerical variable 'x'.

_U_s_a_g_e:

     cdplot(x, ...)

     ## Default S3 method:
     cdplot(x, y,
       plot = TRUE, tol.ylab = 0.05, ylevels = NULL,
       bw = "nrd0", n = 512, from = NULL, to = NULL,
       col = NULL, border = 1, main = "", xlab = NULL, ylab = NULL,
       yaxlabels = NULL, xlim = NULL, ylim = c(0, 1), ...)

     ## S3 method for class 'formula':
     cdplot(formula, data = list(),
       plot = TRUE, tol.ylab = 0.05, ylevels = NULL,
       bw = "nrd0", n = 512, from = NULL, to = NULL,
       col = NULL, border = 1, main = "", xlab = NULL, ylab = NULL,
       yaxlabels = NULL, xlim = NULL, ylim = c(0, 1), ...,
       subset = NULL)

_A_r_g_u_m_e_n_t_s:

       x: an object, the default method expects either a single
          numerical variable.

       y: a '"factor"' interpreted to be the dependent variable

 formula: a '"formula"' of type 'y ~ x' with a single dependent
          '"factor"' and a single numerical explanatory variable.

    data: an optional data frame.

    plot: logical. Should the computed conditional densities be
          plotted?

tol.ylab: convenience tolerance parameter for y-axis annotation. If the
          distance between two labels drops under this threshold, they
          are plotted equidistantly.

 ylevels: a character or numeric vector specifying in which order the
          levels of the dependent variable should be plotted.

bw, n, from, to, ...: arguments passed to 'density'

     col: a vector of fill colors of the same length as 'levels(y)'.
          The default is to call 'gray.colors'.

  border: border color of shaded polygons.

main, xlab, ylab: character strings for annotation

yaxlabels: character vector for annotation of y axis, defaults to
          'levels(y)'.

xlim, ylim: the range of x and y values with sensible defaults.

  subset: an optional vector specifying a subset of observations to be
          used for plotting.

_D_e_t_a_i_l_s:

     'cdplot' computes the conditional densities of 'x' given the
     levels of 'y' weighted by the marginal distribution of 'y'. The
     densities are derived cumulatively over the levels of 'y'.

     This visualization technique is similar to spinograms (see
     'spineplot') and plots P(y | x) against x. The conditional
     probabilities are not derived by discretization (as in the
     spinogram), but using a smoothing approach via 'density'.

     Note, that the estimates of the conditional densities are more
     reliable for  high-density regions of x. Conversely, the are less
     reliable in regions with only few x observations.

_V_a_l_u_e:

     The conditional density functions (cumulative over the levels of
     'y') are returned invisibly.

_A_u_t_h_o_r(_s):

     Achim Zeileis Achim.Zeileis@R-project.org

_R_e_f_e_r_e_n_c_e_s:

     Hofmann, H., Theus, M. (2005), _Interactive graphics for
     visualizing conditional distributions_, Unpublished Manuscript.

_S_e_e _A_l_s_o:

     'spineplot', 'density'

_E_x_a_m_p_l_e_s:

     ## NASA space shuttle o-ring failures
     fail <- factor(c(2, 2, 2, 2, 1, 1, 1, 1, 1, 1, 2, 1, 2, 1, 1, 1,
                      1, 2, 1, 1, 1, 1, 1),
                    levels = 1:2, labels = c("no", "yes"))
     temperature <- c(53, 57, 58, 63, 66, 67, 67, 67, 68, 69, 70, 70,
                      70, 70, 72, 73, 75, 75, 76, 76, 78, 79, 81)

     ## CD plot
     cdplot(fail ~ temperature)
     cdplot(fail ~ temperature, bw = 2)
     cdplot(fail ~ temperature, bw = "SJ")

     ## compare with spinogram
     (spineplot(fail ~ temperature, breaks = 3))

     ## highlighting for failures
     cdplot(fail ~ temperature, ylevels = 2:1)

     ## scatter plot with conditional density
     cdens <- cdplot(fail ~ temperature, plot = FALSE)
     plot(I(as.numeric(fail) - 1) ~ jitter(temperature, factor = 2),
          xlab = "Temperature", ylab = "Conditional failure probability")
     lines(53:81, 1 - cdens[[1]](53:81), col = 2)

