
Core API function for matchedFilter peak detection
Source:R/do_findChromPeaks-functions.R
do_findChromPeaks_matchedFilter.Rd
This function identifies peaks in the chromatographic
time domain as described in Smith 2006. The intensity values are
binned by cutting The LC/MS data into slices (bins) of a mass unit
(binSize
m/z) wide. Within each bin the maximal intensity is
selected. The peak detection is then performed in each bin by
extending it based on the steps
parameter to generate slices
comprising bins current_bin - steps +1
to
current_bin + steps - 1
.
Each of these slices is then filtered with matched filtration using
a second-derative Gaussian as the model peak shape. After filtration
peaks are detected using a signal-to-ration cut-off. For more details
and illustrations see Smith 2006.
Usage
do_findChromPeaks_matchedFilter(
mz,
int,
scantime,
valsPerSpect,
binSize = 0.1,
impute = "none",
baseValue,
distance,
fwhm = 30,
sigma = fwhm/2.3548,
max = 5,
snthresh = 10,
steps = 2,
mzdiff = 0.8 - binSize * steps,
index = FALSE,
sleep = 0
)
Arguments
- mz
Numeric vector with the individual m/z values from all scans/ spectra of one file/sample.
- int
Numeric vector with the individual intensity values from all scans/spectra of one file/sample.
- scantime
Numeric vector of length equal to the number of spectra/scans of the data representing the retention time of each scan.
- valsPerSpect
Numeric vector with the number of values for each spectrum.
- binSize
numeric(1)
specifying the width of the bins/slices in m/z dimension.- impute
Character string specifying the method to be used for missing value imputation. Allowed values are
"none"
(no linear interpolation),"lin"
(linear interpolation),"linbase"
(linear interpolation within a certain bin-neighborhood) and"intlin"
. SeeimputeLinInterpol()
for more details.- baseValue
The base value to which empty elements should be set. This is only considered for
method = "linbase"
and corresponds to theprofBinLinBase()
'sbaselevel
argument.- distance
For
method = "linbase"
: number of non-empty neighboring element of an empty element that should be considered for linear interpolation. See details section for more information.- fwhm
numeric(1)
specifying the full width at half maximum of matched filtration gaussian model peak. Only used to calculate the actual sigma, see below.- sigma
numeric(1)
specifying the standard deviation (width) of the matched filtration model peak.- max
numeric(1)
representing the maximum number of peaks that are expected/will be identified per slice.- snthresh
numeric(1)
defining the signal to noise ratio cutoff.- steps
numeric(1)
defining the number of bins to be merged before filtration (i.e. the number of neighboring bins that will be joined to the slice in which filtration and peak detection will be performed).- mzdiff
numeric(1)
representing the minimum difference in m/z dimension required for peaks with overlapping retention times; can be negative to allow overlap. During peak post-processing, peaks defined to be overlapping are reduced to the one peak with the largest signal.- index
logical(1)
specifying whether indicies should be returned instead of values for m/z and retention times.- sleep
numeric(1)
defining the number of seconds to wait between iterations. Defaults tosleep = 0
. If> 0
a plot is generated visualizing the identified chromatographic peak. Note: this argument is for backward compatibility only and will be removed in future.
Value
A matrix, each row representing an identified chromatographic peak, with columns:
"mz"
: Intensity weighted mean of m/z values of the peak across scans."mzmin"
: Minimum m/z of the peak."mzmax"
: Maximum m/z of the peak."rt"
: Retention time of the peak's midpoint."rtmin"
: Minimum retention time of the peak."rtmax"
: Maximum retention time of the peak."into"
: Integrated (original) intensity of the peak."intf"
: Integrated intensity of the filtered peak."maxo"
: Maximum intensity of the peak."maxf"
: Maximum intensity of the filtered peak."i"
: Rank of peak in merged EIC (<= max
)."sn"
: Signal to noise ratio of the peak.
Details
The intensities are binned by the provided m/z values within each
spectrum (scan). Binning is performed such that the bins are centered
around the m/z values (i.e. the first bin includes all m/z values between
min(mz) - bin_size/2
and min(mz) + bin_size/2
).
References
Colin A. Smith, Elizabeth J. Want, Grace O'Maille, Ruben Abagyan and Gary Siuzdak. "XCMS: Processing Mass Spectrometry Data for Metabolite Profiling Using Nonlinear Peak Alignment, Matching, and Identification" Anal. Chem. 2006, 78:779-787. doi: 10.1021/ac051437y
See also
binYonX()
for a binning function,
imputeLinInterpol()
for the interpolation of missing values.
Other core peak detection functions:
do_findChromPeaks_centWave()
,
do_findChromPeaks_centWaveWithPredIsoROIs()
,
do_findChromPeaks_massifquant()
,
do_findPeaks_MSW()
Examples
## Load the test file
faahko_sub <- loadXcmsData("faahko_sub")
## Subset to one file and restrict to a certain retention time range
data <- filterRt(filterFile(faahko_sub, 1), c(2500, 3000))
## Get m/z and intensity values
mzs <- mz(data)
ints <- intensity(data)
## Define the values per spectrum:
valsPerSpect <- lengths(mzs)
res <- do_findChromPeaks_matchedFilter(mz = unlist(mzs), int = unlist(ints),
scantime = rtime(data), valsPerSpect = valsPerSpect)
head(res)
#> mz mzmin mzmax rt rtmin rtmax into intf maxo
#> [1,] 205.0000 205.0 205.0 2784.635 2770.550 2800.284 1778568.9 3610062.2 84280
#> [2,] 205.9819 205.9 206.0 2786.200 2772.115 2800.284 237993.6 448580.3 10681
#> [3,] 207.0821 207.0 207.1 2712.647 2698.562 2726.731 380873.0 730981.4 18800
#> [4,] 236.0956 236.0 236.1 2518.593 2504.508 2534.242 252282.0 458747.7 12957
#> [5,] 244.1000 244.1 244.1 2828.453 2814.369 2844.103 612169.9 1279308.9 31312
#> [6,] 266.0751 266.0 266.1 2828.453 2815.934 2844.103 113219.0 214886.8 5801
#> maxf i sn
#> [1,] 195026.48 1 28.33394
#> [2,] 23860.11 1 16.53987
#> [3,] 40065.74 1 12.87314
#> [4,] 24536.55 1 14.99012
#> [5,] 69898.24 1 24.20989
#> [6,] 11773.56 1 10.83870