% Generated by roxygen2: do not edit by hand
% Please edit documentation in R/imputation.R
\name{impute_matrix}
\alias{impute_matrix}
\alias{imputeMethods}
\alias{impute_neighbour_average}
\alias{impute_knn}
\alias{impute_mle}
\alias{impute_bpca}
\alias{impute_mixed}
\alias{impute_min}
\alias{impute_zero}
\alias{impute_with}
\alias{impute_RF}
\alias{impute_fun}
\title{Quantitative mass spectrometry data imputation}
\usage{
impute_matrix(x, method, FUN, ...)

imputeMethods()

impute_neighbour_average(x, k = min(x, na.rm = TRUE))

impute_knn(x, ...)

impute_mle(x, ...)

impute_bpca(x, ...)

impute_RF(x, ...)

impute_mixed(x, randna, mar, mnar, ...)

impute_min(x)

impute_zero(x)

impute_with(x, val)

impute_fun(x, FUN, ...)
}
\arguments{
\item{x}{A matrix or an \code{HDF5Matrix} object to be imputed.}

\item{method}{\code{character(1)} defining the imputation method. See
\code{imputeMethods()} for available ones.}

\item{FUN}{A user-provided function that takes a \code{matrix} as input and
returns an imputed \code{matrix} of identical dimensions.}

\item{...}{Additional parameters passed to the inner imputation
function.}

\item{k}{\code{numeric(1)} providing the imputation value used for the
first and last samples if they contain an \code{NA}. The default is
to use the smallest value in the data.}

\item{randna}{\code{logical} of length equal to \code{nrow(object)} defining
which rows are missing at random. The other ones are
considered missing not at random. Only relevant when \code{methods}
is \code{mixed}.}

\item{mar}{Imputation method for values missing at random. See
\code{method} above.}

\item{mnar}{Imputation method for values missing not at
random. See \code{method} above.}

\item{val}{\code{numeric(1)} used to replace all missing values.}
}
\value{
A matrix of same class as \code{x} with dimensions \code{dim(x)}.
}
\description{
The \code{impute_matrix} function performs data imputation on \code{matrix}
objects instance using a variety of methods (see below).

Users should proceed with care when imputing data and take
precautions to assure that the imputation produce valid results,
in particular with naive imputations such as replacing missing
values with 0.
}
\details{
There are two types of mechanisms resulting in missing values in
LC/MSMS experiments.
\itemize{
\item Missing values resulting from absence of detection of a feature,
despite ions being present at detectable concentrations. For
example in the case of ion suppression or as a result from the
stochastic, data-dependent nature of the MS acquisition
method. These missing value are expected to be randomly
distributed in the data and are defined as missing at random
(MAR) or missing completely at random (MCAR).
\item Biologically relevant missing values resulting from the absence
of the low abundance of ions (below the limit of detection of
the instrument). These missing values are not expected to be
randomly distributed in the data and are defined as missing not
at random (MNAR).
}

MNAR features should ideally be imputed with a left-censor method,
such as \code{QRILC} below. Conversely, it is recommended to use host
deck methods such nearest neighbours, Bayesian missing value
imputation or maximum likelihood methods when values are missing
at random.

Currently, the following imputation methods are available.
\itemize{
\item \emph{MLE}: Maximum likelihood-based imputation method using the EM
algorithm. Implemented in the \code{norm::imp.norm()}. function. See
\code{\link[norm:imp.norm]{norm::imp.norm()}} for details and additional parameters. Note
that here, \code{...} are passed to the \code{\link[norm:em.norm]{norm::em.norm()}} function,
rather to the actual imputation function \code{imp.norm}.
\item \emph{bpca}: Bayesian missing value imputation are available, as
implemented in the \code{pcaMethods::pca()} function. See
\code{\link[pcaMethods:pca]{pcaMethods::pca()}} for details and additional parameters.
\item \emph{RF}: Random Forest imputation, as implemented in the
\code{missForest::missForest} function. See \code{\link[missForest:missForest]{missForest::missForest()}}] for
details and additional parameters.
\item \emph{knn}: Nearest neighbour averaging, as implemented in the
\code{impute::impute.knn} function. See \code{\link[impute:impute.knn]{impute::impute.knn()}}] for
details and additional parameters.
\item \emph{QRILC}: A missing data imputation method that performs the
imputation of left-censored missing data using random draws from
a truncated distribution with parameters estimated using
quantile regression. Implemented in the
\code{imputeLCMD::impute.QRILC}
function. \code{\link[imputeLCMD:impute.QRILC]{imputeLCMD::impute.QRILC()}} for details and
additional parameters.
\item \emph{MinDet}: Performs the imputation of left-censored missing data
using a deterministic minimal value approach. Considering a
expression data with \emph{n} samples and \emph{p} features, for each
sample, the missing entries are replaced with a minimal value
observed in that sample. The minimal value observed is estimated
as being the q-th quantile (default \code{q = 0.01}) of the observed
values in that sample. Implemented in the
\code{imputeLCMD::impute.MinDet} function. See
\code{\link[imputeLCMD:impute.MinDet]{imputeLCMD::impute.MinDet()}} for details and additional
parameters.
\item \emph{MinProb}: Performs the imputation of left-censored missing data
by random draws from a Gaussian distribution centred to a
minimal value. Considering an expression data matrix with \emph{n}
samples and \emph{p} features, for each sample, the mean value of the
Gaussian distribution is set to a minimal observed value in that
sample. The minimal value observed is estimated as being the
q-th quantile (default \code{q = 0.01}) of the observed values in
that sample. The standard deviation is estimated as the median
of the feature standard deviations. Note that when estimating
the standard deviation of the Gaussian distribution, only the
peptides/proteins which present more than 50\\% recorded values
are considered. Implemented in the \code{imputeLCMD::impute.MinProb}
function. See \code{\link[imputeLCMD:impute.MinProb]{imputeLCMD::impute.MinProb()}} for details and
additional parameters.
\item \emph{min}: Replaces the missing values with the smallest non-missing
value in the data.
\item \emph{zero}: Replaces the missing values with 0.
\item \emph{mixed}: A mixed imputation applying two methods (to be defined
by the user as \code{mar} for values missing at random and \code{mnar} for
values missing not at random, see example) on two M\link{C}AR/MNAR
subsets of the data (as defined by the user by a \code{randna}
logical, of length equal to nrow(object)).
\item \emph{nbavg}: Average neighbour imputation for fractions collected
along a fractionation/separation gradient, such as sub-cellular
fractions. The method assumes that the fraction are ordered
along the gradient and is invalid otherwise.

Continuous sets \code{NA} value at the beginning and the end of the
quantitation vectors are set to the lowest observed value in the
data or to a user defined value passed as argument \code{k}. Then,
when a missing value is flanked by two non-missing neighbouring
values, it is imputed by the mean of its direct neighbours.
\item \emph{with}: Replaces all missing values with a user-provided value.
\item \emph{none}: No imputation is performed and the missing values are
left untouched. Implemented in case one wants to only impute
value missing at random or not at random with the \emph{mixed}
method.
}

The \code{imputeMethods()} function returns a vector with valid
imputation method arguments.
}
\examples{

## test data
set.seed(42)
m <- matrix(rlnorm(60), 10)
dimnames(m) <- list(letters[1:10], LETTERS[1:6])
m[sample(60, 10)] <- NA

## available methods
imputeMethods()

impute_matrix(m, method = "zero")

impute_matrix(m, method = "min")

impute_matrix(m, method = "knn")

## same as impute_zero
impute_matrix(m, method = "with", val = 0)

## impute with half of the smalles value
impute_matrix(m, method = "with",
              val = min(m, na.rm = TRUE) * 0.5)

## all but third and fourth features' missing values
## are the result of random missing values
randna <- rep(TRUE, 10)
randna[c(3, 9)] <- FALSE

impute_matrix(m, method = "mixed",
              randna = randna,
              mar = "knn",
              mnar = "min")


## user provided (random) imputation function
random_imp <- function(x) {
   m <- mean(x, na.rm = TRUE)
   sdev <- sd(x, na.rm = TRUE)
   n <- sum(is.na(x))
   x[is.na(x)] <- rnorm(n, mean = m, sd = sdev)
   x
}

impute_matrix(m, FUN = random_imp)
}
\references{
Olga Troyanskaya, Michael Cantor, Gavin Sherlock, Pat Brown,
Trevor Hastie, Robert Tibshirani, David Botstein and Russ B.
Altman, Missing value estimation methods for DNA microarrays
Bioinformatics (2001) 17 (6): 520-525.

Oba et al., A Bayesian missing value estimation method for gene
expression profile data, Bioinformatics (2003) 19 (16): 2088-2096.

Cosmin Lazar (2015). imputeLCMD: A collection of methods for
left-censored missing data imputation. R package version
2.0. \url{http://CRAN.R-project.org/package=imputeLCMD}.

Lazar C, Gatto L, Ferro M, Bruley C, Burger T. Accounting for the
Multiple Natures of Missing Values in Label-Free Quantitative
Proteomics Data Sets to Compare Imputation Strategies. J Proteome
Res. 2016 Apr 1;15(4):1116-25. doi:
10.1021/acs.jproteome.5b00981. PubMed PMID:26906401.
}
\author{
Laurent Gatto
}
