data                  package:utils                  R Documentation

_D_a_t_a _S_e_t_s

_D_e_s_c_r_i_p_t_i_o_n:

     Loads specified data sets, or list the available data sets.

_U_s_a_g_e:

     data(..., list = character(0), package = NULL, lib.loc = NULL,
          verbose = getOption("verbose"), envir = .GlobalEnv)

_A_r_g_u_m_e_n_t_s:

     ...: a sequence of names or literal character strings.

    list: a character vector.

 package: a character vector giving the package(s) to look in for data
          sets, or 'NULL'.

          By default, all packages in the search path are used, then
          the 'data' subdirectory (if present) of the current working
          directory. 

 lib.loc: a character vector of directory names of R libraries, or
          'NULL'.  The default value of 'NULL' corresponds to all
          libraries currently known.

 verbose: a logical.  If 'TRUE', additional diagnostics are printed.

   envir: the environment where the data should be loaded.

_D_e_t_a_i_l_s:

     Currently, four formats of data files are supported:


        1.  files ending '.R' or '.r' are 'source()'d in, with the R
           working directory changed temporarily to the directory
           containing the respective file. ('data' ensures that the
           'utils' package is attached, in case it had been run _via_
           'utils::data'.)

        2.  files ending '.RData' or '.rda' are 'load()'ed.

        3.  files ending '.tab', '.txt' or '.TXT' are read using
           'read.table(..., header = TRUE)', and hence result in a data
           frame.

        4.  files ending '.csv' or '.CSV' are read using
           'read.table(..., header = TRUE, sep = ";")', and also result
           in a data frame.

     If more than one matching file name is found, the first on this
     list is used.

     The data sets to be loaded can be specified as a sequence of names
     or character strings, or as the character vector 'list', or as
     both.

     For each given data set, the first two types ('.R' or '.r', and
     '.RData' or '.rda' files) can create several variables in the load
     environment, which might all be named differently from the data
     set.  The second two ('.tab', '.txt', or '.TXT', and '.csv' or
     '.CSV' files) will always result in the creation of a single
     variable with the same name as the data set. 

     If no data sets are specified, 'data' lists the available data
     sets.  It looks for a new-style data index in the 'Meta' or, if
     this is not found, an old-style '00Index' file in the 'data'
     directory of each specified package, and uses these files to
     prepare a listing.  If there is a 'data' area but no index,
     available data files for loading are computed and included in the
     listing, and a warning is given: such packages are incomplete. 
     The information about available data sets is returned in an object
     of class '"packageIQR"'.  The structure of this class is
     experimental. Where the datasets have a different name from the
     argument that should be used to retrieve them the index will have
     an entry like 'beaver1 (beavers)' which tells us that dataset
     'beaver1' can be retrieved by the call 'data(beaver)'.

     If 'lib.loc' and 'package' are both 'NULL' (the default), the data
     sets are searched for in all the currently loaded packages then in
     the 'data' directory (if any) of the current working directory.

     If 'lib.loc = NULL' but 'package' is specified as a character
     vector, the specified package(s) are searched for first amongst
     loaded packages and then in the default library/ies (see
     '.libPaths').

     If 'lib.loc' _is_ specified (and not 'NULL'), packages are
     searched for in the specified library/ies, even if they are
     already loaded from another library.

     To just look in the 'data' directory of the current working
     directory, set 'package = character(0)' (and 'lib.loc = NULL', the
     default).

_V_a_l_u_e:

     a character vector of all data sets specified, or information
     about all available data sets in an object of class '"packageIQR"'
     if none were specified.

_N_o_t_e:

     The data files can be many small files.  On some file systems it
     is desirable to save space, and the files in the 'data' directory
     of an installed package can be zipped up as a zip archive
     'Rdata.zip'.  You will need to provide a single-column file
     'filelist' of file names in that directory.

     One can take advantage of the search order and the fact that a
     '.R' file will change directory.  If raw data are stored in
     'mydata.txt' then one can set up 'mydata.R' to read 'mydata.txt'
     and pre-process it, e.g., using 'transform'. For instance one can
     convert numeric vectors to factors with the appropriate labels. 
     Thus, the '.R' file can effectively contain a metadata
     specification for the plaintext formats.

_S_e_e _A_l_s_o:

     'help' for obtaining documentation on data sets, 'save' for
     _creating_ the second ('.rda') kind of data, typically the most
     efficient one.

_E_x_a_m_p_l_e_s:

     require(utils)
     data()                       # list all available data sets
     try(data(package = "rpart") )# list the data sets in the rpart package
     data(USArrests, "VADeaths")  # load the data sets 'USArrests' and 'VADeaths'
     help(USArrests)              # give information on data set 'USArrests'

