
xcms result object for very large data sets
Source:R/XcmsExperimentHdf5-functions.R, R/XcmsExperimentHdf5.R
XcmsExperimentHdf5.RdThe xcms result objects XcmsExperiment() and XCMSnExp() keep all
preprocessing results in memory and can thus (depending on the size of the
data set) require a large amount of memory. In contrast, the
XcmsExperimentHdf5 class, by using an on-disk data storage mechanism,
has a much lower memory footprint allowing also to analyze very large data
sets on regular computer systems such as desktop or laptop computers. With
some exceptions, including additional parameters, the functionality and
usability of this object is identical to the default XcmsExperiment
object.
This help page lists functions that have additional or different parameters
or properties than the respective methods for XcmsExperiment() objects.
For all other functions not listed here the usability is identical to those
for the XcmsExperiment() object (see the respective help page for
information).
Usage
toXcmsExperimentHdf5(object, hdf5File = tempfile())
toXcmsExperiment(object, ...)
# S4 method for class 'XcmsExperimentHdf5'
chromPeakData(
object,
msLevel = integer(),
peaks = character(),
columns = character(),
return.type = c("DataFrame", "data.frame"),
bySample = FALSE
)
# S4 method for class 'XcmsExperimentHdf5'
filterChromPeaks(
object,
keep = rep(TRUE, nrow(chromPeaks(object))),
method = "keep",
...
)
# S4 method for class 'XcmsExperimentHdf5,PeakGroupsParam'
adjustRtimePeakGroups(object, param = PeakGroupsParam(), msLevel = 1L)
# S4 method for class 'XcmsExperimentHdf5'
filterFeatureDefinitions(object, features = integer())Arguments
- object
XcmsExperimentHdf5object.- hdf5File
For
toXcmsExperimentHdf5():character(1)with the path and name of the (not yet existing) file where the preprocessing results should be stored to.- ...
additional parameters eventually passed to downstream functions.
- msLevel
For
chromPeaks()andchromPeakData(): optionalintegerwith the MS level(s) from which the data should be returned. By defaultmsLevel = integer()results from all MS levels are returned (if present). ForrefineChromPeaks():integer(1)with the MS level from which chromatographic peaks should be refined.- peaks
For
chromPeakData(): optionalcharacterwith the ID of chromatographic peaks (row name inchromPeaks()) for which the data should be returned. By default (peaks = character()) the data for all chromatographic peaks is returned.- columns
For
chromPeakData()~: optionalcharacterallowing to define a subset of columns that should be included in the returned data frame. By default (columns = character()`) the full data is returned.- return.type
For
chromPeakData():character(1)specifying the type of object that should be returned. Can be eitherreturn.type = "DataFrame"(the default) to return aDataFrame, orreturn.type = "data.frame"to return the results as adata.frame.- bySample
For
chromPeaks()andchromPeakData():logical(1)whether the data should be returned by sample, i.e. as alistofmatrixordata.frameobjects, one for each sample.- keep
For
filterChromPeaks(): defining the chromatographic peaks to keep: either alogicalwith the same length than the number of chromatographic peaks, anintegerwith the indices or acharacterwith the IDs of the chromatographic peaks to keep.- method
For
filterChromPeaks():character(1); currently onlymethod = "keep"is supported.- param
parameter object defining and configuring the algorithm to be used.
- features
For
filterFeatureDefinitions(): defining the features to keep: either alogicalwith the same length than the number of features, anintegerwith the indices or acharacterwith the ID of the features to keep.
Details
The XcmsExperimentHdf5 object stores all preprocessing results (except
adjusted retention times, which are stored as an additional spectra variable
in the object's Spectra::Spectra() object), in a file in HDF5 format.
XcmsExperimentHdf5 uses a different naming scheme for chromatographic
peaks: for efficiency reasons, chromatographic peak data is organized by
sample and MS level. The chrom peak IDs are hence in the format
CPMsExperiment object) and the
HDF5 files do not support parallel processing, thus preprocessing results need to be stored or loaded sequentially.
All functionality for XcmsExperimentHdf5 objects is optimized to reduce
memory demand at the cost of eventually lower performance.
Conversion between XcmsExperiment and XcmsExperimentHdf5
To use the XcmsExperimentHdf5 class for preprocessing results, the
hdf5File parameter of the findChromPeaks() function needs to be defined,
specifying the path and name of the HDF5 file to store the results. In
addition it is possible to convert a XcmsExperiment object to a
XcmsExperimentHdf5 object with the toXcmsExperimentHdf5() function. All
present preprocessing results will be stored to the specified HDF5 file.
To load all preprocessing results into memory and hence change from a
XcmsExperimentHdf5 to a XcmsExperiment object, the toXcmsExperument()
function can be used.
Using the HDF5 file-based on-disk data storage
Calling findChromPeaks() on an MsExperiment using the parameter
hdf5File will return an instance of the XcmsExperimentHdf5 class and
hence use the on-disk data storage mode described on this page. The results
are stored in the file specified with parameter hdf5File.
Subset
[: subset theXcmsExperimentHdf5object to the specified samples. ParameterskeepChromPeaks(defaultTRUE),keepAdjustedRtime(defaultTRUE) andkeepFeatures(defaultFALSE) allow to configure whether present chromatographic peaks, alignment or correspondence results should be retained. This will only change information in the object (i.e., the reference to the respective entries in the HDF5 file), but will not change the content of the HDF5 file. Thus, reverting the retention times of detected chromatographic peaks is not supported andkeepChromPeaks = TRUEwithkeepAdjustedRtime = FALSEwill throw an error. Note that withkeepChromPeaks = FALSEalsokeepFeaturesis set toFALSE.filterChromPeaks()andfilterFeatureDefinitions()to filter the chromatographic peak and correspondence results, respectively. See documentation below for details. Subset using unsorted or duplicated indices is not supported.
Functionality related to chromatographic peaks
chromPeaks()gains parameterbySample = FALSEthat, if set toTRUEreturns alistofchromPeaksmatrices, one for each sample. Due to the way data is organized inXcmsExperimentHdf5objects this is more efficient thanbySample = FALSE. Thus, in cases where chrom peak data is subsequently evaluated or processed by sample, it is suggested to usebySample = TRUE.chromPeakData()gains a new parameterpeaks = character()which allows to specify from which chromatographic peaks data should be returned. For these chromatographic peaks the ID (row name inchromPeaks()) should be provided with thepeaksparameter. This can reduce the memory requirement for cases in which only data of some selected chromatographic peaks needs to be extracted. Also,chromPeakData()supports thebySampleparameter described forchromPeaks()above. All other parameters present also forchromPeakData()ofXcmsExperimentobjects, such ascolumnsare supported.filterChromPeaks()allows to filter the chromatographic peaks specifying which should be retainend using thekeepparameter. This can be either alogical,characterorintegervector. Duplicated or unsorted indices are not supported. Eventually present feature definitions will be updated as well. The function returns the object with the filtered chromatographic peaks.
Retention time alignment
adjustRtimePeakGroups()andadjustRtime()withPeakGroupsParam: parameterextraPeaksofPeakGroupsParamis ignored. Anchor peaks are thus only defined using theminFractionand the optionalsubsetparameter.
Correspondence analysis results
featureDefinitions(): similarly tofeatureDefinitions()for XcmsExperiment objects, this method returns adata.framewith the characteristics for the defined LC-MS features. The function forXcmsExperimentHdf5does however not return the"peakidx"column with the indices of the chromatographic peaks per feature. Also, the columns are returned in alphabetic order.featureValues(): for parametervalue, the optionvalue = "index"(i.e. returning the index of the chromatographic peaks within thechromPeaks()matrix per feature) is not supported.filterFeatureDefinitions(): filter the feature definitions keeping only the specified features. Parameterfeaturescan be used to define the features to retain. It supports alogical,integerindices orcharacterwith the IDs of the features (i.e., their row names infeatureDefinitions()). The function returns the inputXcmsExperimentHdf5with the filtered content.
Examples
## Create a MsExperiment object representing the data from an LC-MS
## experiment.
library(MsExperiment)
## Define the raw data files
fls <- c(system.file('cdf/KO/ko15.CDF', package = "faahKO"),
system.file('cdf/KO/ko16.CDF', package = "faahKO"),
system.file('cdf/KO/ko18.CDF', package = "faahKO"))
## Define a data frame with the sample characterization
df <- data.frame(mzML_file = basename(fls),
sample = c("ko15", "ko16", "ko18"))
## Importe the data. This will initialize a `Spectra` object representing
## the raw data and assign these to the individual samples.
mse <- readMsExperiment(spectraFiles = fls, sampleData = df)
## Perform chromatographic peak detection storing the data in an HDF5 file
## Parameter `hdf5File` has to be provided and needs to be the path and
## name of a (not yet existing) file to which results are going to be
## stored. For the example below we use a temporary file.
xmse <- findChromPeaks(mse, param = CentWaveParam(prefilter = c(4, 100000)),
hdf5File = tempfile())
xmse
#> Object of class XcmsExperimentHdf5
#> Spectra: MS1 (3834)
#> Experiment data: 3 sample(s)
#> Sample data links:
#> - spectra: 3 sample(s) to 3834 element(s).
#> xcms results:
#> - chromatographic peaks in MS level(s): 1
#> results storage file:
#> /tmp/Rtmp2DqkaB/file363b4de0cb68
## Extract selected columnds from the chromatographic peak detection
## results
chromPeaks(xmse, columns = c("rt", "mz", "into")) |> head()
#> rt mz into sample
#> CP1S1000001 2682.913 360 5641322 1
#> CP1S1000002 2679.783 344 5210016 1
#> CP1S1000003 2678.218 343 24147443 1
#> CP1S1000004 2679.783 365 14975761 1
#> CP1S1000005 2659.438 365 3520591 1
#> CP1S1000006 2784.635 280 2537599 1
## Extract the results per sample
res <- chromPeaks(xmse, columns = c("rt", "mz", "into"), bySample = TRUE)
## The chromatographic peaks of the second sample:
res[[2]] |> head()
#> rt mz into
#> CP1S2000001 2686.042 360 10248211
#> CP1S2000002 2686.042 344 5700652
#> CP1S2000003 2686.042 343 26229546
#> CP1S2000004 2596.840 365 2358688
#> CP1S2000005 2686.042 365 15565868
#> CP1S2000006 2797.154 279 10916521
## Convert the result object to the in-memory representation:
xmse_mem <- toXcmsExperiment(xmse)
xmse_mem
#> Object of class XcmsExperiment
#> Spectra: MS1 (3834)
#> Experiment data: 3 sample(s)
#> Sample data links:
#> - spectra: 3 sample(s) to 3834 element(s).
#> xcms results:
#> - chromatographic peaks: 181 in MS level(s): 1