Landmark-based alignment: aligning a dataset against an external reference
Source:R/XcmsExperiment.R
, R/do_adjustRtime-functions.R
, R/functions-Params.R
, and 1 more
LamaParama.Rd
Alignment is achieved using the ['adjustRtime()'] method with a `param` of class `LamaParama`. This method corrects retention time by aligning chromatographic data with an external reference dataset.
Chromatographic peaks in the experimental data are first matched to predefined (external) landmark features based on their mass-to-charge ratio and retention time and subsequently the data is aligned by minimizing the differences in retention times between the matched chromatographic peaks and lamas. This adjustment is performed file by file.
Adjustable parameters such as `ppm`, `tolerance`, and `toleranceRt` define acceptable deviations during the matching process. It's crucial to note that only lamas and chromatographic peaks exhibiting a one-to-one mapping are considered when estimating retention time shifts. If a file has no peaks matching with lamas, no adjustment will be performed, and the the retention times will be returned as-is. Users can evaluate this matching, for example, by checking the number of matches and ranges of the matching peaks, by first running `[matchLamasChromPeaks()]`.
Different warping methods are available; users can choose to fit a *loess* (`method = "loess"`, the default) or a *gam* (`method = "gam"`) between the reference data points and observed matching ChromPeaks. Additional parameters such as `span`, `weight`, `outlierTolerance`, `zeroWeight`, and `bs` are specific to these models. These parameters offer flexibility in fine-tuning how the matching chromatographic peaks are fitted to the lamas, thereby generating a model to align the overall retention time for a single file.
Other functions related to this method:
- `LamaParama()`: return the respective parameter object for alignment using `adjustRtime()` function. It is also the input for the functions listed below.
- `matchLamasChromPeaks()`: quickly matches each file's ChromPeaks to Lamas, allowing the user to evaluate the matches for each file.
- `summarizeLamaMatch()`: generates a summary of the `LamaParama` method. See below for the details of the return object.
- `matchedRtimes()`: Access the list of `data.frame` saved in the `LamaParama` object, generated by the `matchLamasChromPeaks()` function.
- `plot()`:plot the chromatographic peaks versus the reference lamas as well as the fitting line for the chosen model type. The user can decide what file to inspect by specifying the assay number with the parameter `assay`
Usage
# S4 method for class 'XcmsExperiment,LamaParama'
adjustRtime(object, param, BPPARAM = bpparam(), ...)
matchLamasChromPeaks(object, param, BPPARAM = bpparam())
summarizeLamaMatch(param)
matchedRtimes(param)
LamaParama(
lamas = matrix(ncol = 2, nrow = 0, dimnames = list(NULL, c("mz", "rt"))),
method = c("loess", "gam"),
span = 0.5,
outlierTolerance = 3,
zeroWeight = 10,
ppm = 20,
tolerance = 0,
toleranceRt = 5,
bs = "tp"
)
# S4 method for class 'LamaParama,ANY'
plot(
x,
index = 1L,
colPoints = "#00000060",
colFit = "#00000080",
xlab = "Matched Chromatographic peaks",
ylab = "Lamas",
...
)
Arguments
- object
An object of class `XcmsExperiment` with defined ChromPeaks.
- param
An object of class `LamaParama` that will later be used for adjustment using the `[adjustRtime()]` function.
- BPPARAM
For `matchLamasChromPeaks()`: parallel processing setup. Defaults to `BPPARAM = bpparam()`. See [bpparam()] for more information.
- ...
For `plot()`: extra parameters to be passed to the function.
- lamas
For `LamaParama`: `matrix` or `data.frame` with the m/z and retention times values of features (as first and second column) from the external dataset on which the alignment will be based on.
- method
For `LamaParama`:`character(1)` with the type of warping. Either `method = "gam"` or `method = "loess"` (default).
- span
For `LamaParama`: `numeric(1)` defining the degree of smoothing (`method = "loess"`). This parameter is passed to the internal call to [loess()].
- outlierTolerance
For `LamaParama`: `numeric(1)` defining the settings for outlier removal during the fitting. By default (with `outlierTolerance = 3`), all data points with absolute residuals larger than 3 times the mean absolute residual of all data points from the first, initial fit, are removed from the final model fit.
- zeroWeight
For `LamaParama`: `numeric(1)`: defines the weight of the first data point (i.e. retention times of the first lama-chromatographic peak pair). Values larger than 1 reduce warping problems in the early RT range.
- ppm
For `LamaParama`: `numeric(1)` defining the m/z-relative maximal allowed difference in m/z between `lamas` and chromatographic peaks. Used for the mapping of identified chromatographic peaks and lamas.
- tolerance
For `LamaParama`: `numeric(1)` defining the absolute acceptable difference in m/z between lamas and chromatographic peaks. Used for the mapping of identified chromatographic peaks and `lamas`.
- toleranceRt
For `LamaParama`: `numeric(1)` defining the absolute acceptable difference in retention time between lamas and chromatographic peaks. Used for the mapping of identified chromatographic peaks and `lamas`.
- bs
For `LamaParama()`: `character(1)` defining the GAM smoothing method. (defaults to thin plate, `bs = "tp"`)
- x
For `plot()`: object of class `LamaParama` to be plotted.
- index
For `plot()`: `numeric(1)` index of the file that should be plotted.
- colPoints
For `plot()`: color for the plotting of the datapoint.
- colFit
For `plot()`: color of the fitting line.
- xlab, ylab
For `plot()`: x- and y-axis labels.
Value
For `matchLamasChromPeaks()`: A `LamaParama` object with new slot `rtMap` composed of a list of matrices representing the 1:1 matches between Lamas (ref) and ChromPeaks (obs). To access this, `matchedRtimes()` can be used.
For `matchedRtimes()`: A list of `data.frame` representing matches between chromPeaks and `lamas` for each files.
For `summarizeLamaMatch()`:A `data.frame` with:
- "Total_peaks": total number of chromatographic peaks in the file.
- "Matched_peak": The number of matched peaks to Lamas.
- "Total_Lamas": Total number of Lamas.
- "Model_summary": `summary.loess` or `summary.gam` object for each file.
Note
If there are no matches when using `matchLamasChromPeaks()`, the file retention will not be adjusted when calling [adjustRtime()] with the same `LamaParama` and `XcmsExperiment` object.
To see examples on how to utilize this methods and its functionality, see the vignette.
Examples
## load test and reference datasets
ref <- loadXcmsData("xmse")
tst <- loadXcmsData("faahko_sub2")
## create lamas input from the reference dataset
library(MsExperiment)
#> Loading required package: ProtGenerics
#>
#> Attaching package: ‘ProtGenerics’
#> The following object is masked from ‘package:stats’:
#>
#> smooth
f <- sampleData(ref)$sample_type
f[f == "QC"] <- NA
ref <- filterFeatures(ref, PercentMissingFilter(threshold = 0, f = f))
#> 5 features were removed
ref_mz_rt <- featureDefinitions(ref)[, c("mzmed","rtmed")]
## Set up the LamaParama object
param <- LamaParama(lamas = ref_mz_rt, method = "loess", span = 0.5,
outlierTolerance = 3, zeroWeight = 10, ppm = 20,
tolerance = 0, toleranceRt = 20, bs = "tp")
## input into `adjustRtime()`
tst_adjusted <- adjustRtime(tst, param = param)
## run diagnostic functions to pre-evaluate alignment
param <- matchLamasChromPeaks(tst, param = param)
mtch <- matchedRtimes(param)
## Access summary of matches and model information
summary <- summarizeLamaMatch(param)
#> Warning: number of columns of result, 3, is not a multiple of vector length 2 of arg 1
#> Warning: number of columns of result, 3, is not a multiple of vector length 2 of arg 1
#> Warning: number of columns of result, 3, is not a multiple of vector length 2 of arg 1
##coverage for each file
summary$Matched_peaks / summary$Total_peaks * 100
#> [1] 39.08046 51.00000 55.73770
## Access the information on the model of for the first file
summary$model_summary[[1]]
#> NULL