% Generated by roxygen2: do not edit by hand
% Please edit documentation in R/matching.R
\name{closest}
\alias{closest}
\alias{common}
\alias{join}
\title{Relaxed Value Matching}
\usage{
closest(
  x,
  table,
  tolerance = Inf,
  ppm = 0,
  duplicates = c("keep", "closest", "remove"),
  nomatch = NA_integer_,
  .check = TRUE
)

common(
  x,
  table,
  tolerance = Inf,
  ppm = 0,
  duplicates = c("keep", "closest", "remove"),
  .check = TRUE
)

join(
  x,
  y,
  tolerance = 0,
  ppm = 0,
  type = c("outer", "left", "right", "inner"),
  .check = TRUE,
  ...
)
}
\arguments{
\item{x}{\code{numeric}, the values to be matched. In contrast to
\code{\link[=match]{match()}} \code{x} has to be sorted in increasing order and must not contain any
\code{NA}.}

\item{table}{\code{numeric}, the values to be matched against. In contrast to
\code{\link[=match]{match()}} \code{table} has to be sorted in increasing order and must not contain
any \code{NA}.}

\item{tolerance}{\code{numeric}, accepted tolerance. Could be of length one or
the same length as \code{x}.}

\item{ppm}{\code{numeric(1)} representing a relative, value-specific
parts-per-million (PPM) tolerance that is added to \code{tolerance}.}

\item{duplicates}{\code{character(1)}, how to handle duplicated matches. Has to be
one of \code{c("keep", "closest", "remove")}. No abbreviations allowed.}

\item{nomatch}{\code{integer(1)}, if the difference
between the value in \code{x} and \code{table} is larger than
\code{tolerance} \code{nomatch} is returned.}

\item{.check}{\code{logical(1)} turn off checks for increasingly sorted \code{x} and
\code{y}. This should just be done if it is ensured by other methods that \code{x} and
\code{y} are sorted, see also \code{\link[=closest]{closest()}}.}

\item{y}{\code{numeric}, the values to be joined. Should be sorted.}

\item{type}{\code{character(1)}, defines how \code{x} and \code{y} should be joined. See
details for \code{join}.}

\item{...}{ignored.}
}
\value{
\code{closest} returns an \code{integer} vector of the same length as \code{x}
giving the closest position in \code{table} of the first match or \code{nomatch} if
there is no match.

\code{common} returns a \code{logical} vector of length \code{x} that is \code{TRUE} if the
element in \code{x} was found in \code{table}. It is similar to \code{\link{\%in\%}}.

\code{join} returns a \code{matrix} with two columns, namely \code{x} and \code{y},
representing the index of the values in \code{x} matching the corresponding value
in \code{y} (or \code{NA} if the value does not match).
}
\description{
These functions offer relaxed matching of one vector in another.
In contrast to the similar \code{\link[=match]{match()}} and \code{\link{\%in\%}} functions they
just accept \code{numeric} arguments but have an additional \code{tolerance}
argument that allows relaxed matching.
}
\details{
For \code{closest}/\code{common} the \code{tolerance} argument could be set to \code{0} to get
the same results as for \code{\link[=match]{match()}}/\code{\link{\%in\%}}. If it is set to \code{Inf} (default)
the index of the closest values is returned without any restriction.

It is not guaranteed that there is a one-to-one matching for neither the
\code{x} to \code{table} nor the \code{table} to \code{x} matching.

If multiple elements in \code{x} match a single element in \code{table} all their
corresponding indices are returned if \code{duplicates="keep"} is set (default).
This behaviour is identical to \code{\link[=match]{match()}}. For \code{duplicates="closest"} just
the closest element in \code{x} gets the corresponding index in \code{table} and
for \code{duplicates="remove"} all elements in \code{x} that match to the same element
in \code{table} are set to \code{nomatch}.

If a single element in \code{x} matches multiple elements in \code{table} the \emph{closest}
is returned for \code{duplicates="keep"} or \code{duplicates="closest"} (\emph{keeping}
multiple matches isn't possible in this case because the return value should
be of the same length as \code{x}). If the differences between \code{x} and the
corresponding matches in \code{table} are identical the lower index (the smaller
element in \code{table}) is returned. There is one exception: if the lower index
is already returned for another \code{x} with a smaller difference to this
\code{index} the higher one is returned for \code{duplicates = "closer"}
(but only if there is no other \code{x} that is closer to the higher one).
For \code{duplicates="remove"} all multiple matches are returned as \code{nomatch} as
above.

\code{.checks = TRUE} tests among other input validation checks for increasingly
sorted \code{x} and \code{table} arguments that are mandatory assumptions for the
\code{closest} algorithm. These checks require to loop through both vectors and
compare each element against its precursor.
Depending on the length and distribution of \code{x} and \code{table} these checks take
equal/more time than the whole \code{closest} algorithm. If it is ensured by other
methods that both arguments \code{x} and \code{table} are sorted the tests could be
skipped by \code{.check = FALSE}. In the case that \code{.check = FALSE} is used
and one of \code{x} and \code{table} is not sorted (or decreasingly sorted)
the output would be incorrect in the best case and result in infinity
loop in the average and worst case.

\code{join}: joins two \code{numeric} vectors by mapping values in \code{x} with
values in \code{y} and \emph{vice versa} if they are similar enough (provided the
\code{tolerance} and \code{ppm} specified). The function returns a \code{matrix} with the
indices of mapped values in \code{x} and \code{y}. Parameter \code{type} allows to define
how the vectors will be joined: \code{type = "left"}: values in \code{x} will be
mapped to values in \code{y}, elements in \code{y} not matching any value in \code{x} will
be discarded. \code{type = "right"}: same as \code{type = "left"} but for \code{y}.
\code{type = "outer"}: return matches for all values in \code{x} and in \code{y}.
\code{type = "inner"}: report only indices of values that could be mapped.
}
\note{
\code{join} is based on \code{closest(x, y, tolerance, duplicates = "closest")}.
That means for multiple matches just the closest one is reported.
}
\examples{
## Define two vectors to match
x <- c(1, 3, 5)
y <- 1:10

## Compare match and closest
match(x, y)
closest(x, y)

## If there is no exact match
x <- x + 0.1
match(x, y) # no match
closest(x, y)

## Some new values
x <- c(1.11, 45.02, 556.45)
y <- c(3.01, 34.12, 45.021, 46.1, 556.449)

## Using a single tolerance value
closest(x, y, tolerance = 0.01)

## Using a value-specific tolerance accepting differences of 20 ppm
closest(x, y, ppm = 20)

## Same using 50 ppm
closest(x, y, ppm = 50)

## Sometimes multiple elements in `x` match to `table`
x <- c(1.6, 1.75, 1.8)
y <- 1:2
closest(x, y, tolerance = 0.5)
closest(x, y, tolerance = 0.5, duplicates = "closest")
closest(x, y, tolerance = 0.5, duplicates = "remove")

## Are there any common values?
x <- c(1.6, 1.75, 1.8)
y <- 1:2
common(x, y, tolerance = 0.5)
common(x, y, tolerance = 0.5, duplicates = "closest")
common(x, y, tolerance = 0.5, duplicates = "remove")

## Join two vectors
x <- c(1, 2, 3, 6)
y <- c(3, 4, 5, 6, 7)

jo <- join(x, y, type = "outer")
jo
x[jo$x]
y[jo$y]

jl <- join(x, y, type = "left")
jl
x[jl$x]
y[jl$y]

jr <- join(x, y, type = "right")
jr
x[jr$x]
y[jr$y]

ji <- join(x, y, type = "inner")
ji
x[ji$x]
y[ji$y]
}
\seealso{
\code{\link[=match]{match()}}

\code{\link{\%in\%}}

Other grouping/matching functions: 
\code{\link{bin}()},
\code{\link{gnps}()}
}
\author{
Sebastian Gibb, Johannes Rainer
}
\concept{grouping/matching functions}
