
xcms result object for very large data sets
Source:R/XcmsExperimentHdf5-functions.R
, R/XcmsExperimentHdf5.R
XcmsExperimentHdf5.Rd
The xcms result objects XcmsExperiment()
and XCMSnExp()
keep all
preprocessing results in memory and can thus (depending on the size of the
data set) require a large amount of memory. In contrast, the
XcmsExperimentHdf5
class, by using an on-disk data storage mechanism,
has a much lower memory footprint allowing also to analyze very large data
sets on regular computer systems such as desktop or laptop computers. With
some exceptions, including additional parameters, the functionality and
usability of this object is identical to the default XcmsExperiment
object.
This help page lists functions that have additional or different parameters
or properties than the respective methods for XcmsExperiment()
objects.
For all other functions not listed here the usability is identical to those
for the XcmsExperiment()
object (see the respective help page for
information).
Usage
toXcmsExperimentHdf5(object, hdf5File = tempfile())
toXcmsExperiment(object, ...)
# S4 method for class 'XcmsExperimentHdf5'
chromPeakData(
object,
msLevel = integer(),
peaks = character(),
return.type = c("DataFrame", "data.frame"),
bySample = FALSE
)
# S4 method for class 'XcmsExperimentHdf5'
filterChromPeaks(
object,
keep = rep(TRUE, nrow(chromPeaks(object))),
method = "keep",
...
)
# S4 method for class 'XcmsExperimentHdf5,PeakGroupsParam'
adjustRtimePeakGroups(object, param = PeakGroupsParam(), msLevel = 1L)
# S4 method for class 'XcmsExperimentHdf5'
filterFeatureDefinitions(object, features = integer())
Arguments
- object
XcmsExperimentHdf5
object.- hdf5File
For
toXcmsExperimentHdf5()
:character(1)
with the path and name of the (not yet existing) file where the preprocessing results should be stored to.- ...
additional parameters eventually passed to downstream functions.
- msLevel
For
chromPeaks()
andchromPeakData()
: optionalinteger
with the MS level(s) from which the data should be returned. By defaultmsLevel = integer()
results from all MS levels are returned (if present). ForrefineChromPeaks()
:integer(1)
with the MS level from which chromatographic peaks should be refined.- peaks
For
chromPeakData()
: optionalcharacter
with the ID of chromatographic peaks (row name inchromPeaks()
) for which the data should be returned. By default (peaks = character()
) the data for all chromatographic peaks is returned.- return.type
For
chromPeakData()
:character(1)
specifying the type of object that should be returned. Can be eitherreturn.type = "DataFrame"
(the default) to return aDataFrame
, orreturn.type = "data.frame"
to return the results as adata.frame
.- bySample
For
chromPeaks()
andchromPeakData()
:logical(1)
whether the data should be returned by sample, i.e. as alist
ofmatrix
ordata.frame
objects, one for each sample.- keep
For
filterChromPeaks()
: defining the chromatographic peaks to keep: either alogical
with the same length than the number of chromatographic peaks, aninteger
with the indices or acharacter
with the IDs of the chromatographic peaks to keep.- method
For
filterChromPeaks()
:character(1)
; currently onlymethod = "keep"
is supported.- param
parameter object defining and configuring the algorithm to be used.
- features
For
filterFeatureDefinitions()
: defining the features to keep: either alogical
with the same length than the number of features, aninteger
with the indices or acharacter
with the ID of the features to keep.
Details
The XcmsExperimentHdf5
object stores all preprocessing results (except
adjusted retention times, which are stored as an additional spectra variable
in the object's Spectra::Spectra()
object), in a file in HDF5 format.
XcmsExperimentHdf5
uses a different naming scheme for chromatographic
peaks: for efficiency reasons, chromatographic peak data is organized by
sample and MS level. The chrom peak IDs are hence in the format
CPMsExperiment
object) and the
HDF5 files do not support parallel processing, thus preprocessing results need to be stored or loaded sequentially.
All functionality for XcmsExperimentHdf5
objects is optimized to reduce
memory demand at the cost of eventually lower performance.
Conversion between XcmsExperiment
and XcmsExperimentHdf5
To use the XcmsExperimentHdf5
class for preprocessing results, the
hdf5File
parameter of the findChromPeaks()
function needs to be defined,
specifying the path and name of the HDF5 file to store the results. In
addition it is possible to convert a XcmsExperiment
object to a
XcmsExperimentHdf5
object with the toXcmsExperimentHdf5()
function. All
present preprocessing results will be stored to the specified HDF5 file.
To load all preprocessing results into memory and hence change from a
XcmsExperimentHdf5
to a XcmsExperiment
object, the toXcmsExperument()
function can be used.
Using the HDF5 file-based on-disk data storage
Calling findChromPeaks()
on an MsExperiment
using the parameter
hdf5File
will return an instance of the XcmsExperimentHdf5
class and
hence use the on-disk data storage mode described on this page. The results
are stored in the file specified with parameter hdf5File
.
Subset
[
: subset theXcmsExperimentHdf5
object to the specified samples. ParameterskeepChromPeaks
(defaultTRUE
),keepAdjustedRtime
(defaultFALSE
) andkeepFeatures
(defaultFALSE
) allow to configure whether present chromatographic peaks, alignment or correspondence results should be retained. This will only change information in the object (i.e., the reference to the respective entries in the HDF5 file), but will not change the content of the HDF5 file. Note that withkeepChromPeaks = FALSE
alsokeepFeatures
is set toFALSE
.filterChromPeaks()
andfilterFeatureDefinitions()
to filter the chromatographic peak and correspondence results, respectively. See documentation below for details. Subset using unsorted or duplicated indices is not supported.
Functionality related to chromatographic peaks
chromPeaks()
gains parameterbySample = FALSE
that, if set toTRUE
returns alist
ofchromPeaks
matrices, one for each sample. Due to the way data is organized inXcmsExperimentHdf5
objects this is more efficient thanbySample = FALSE
. Thus, in cases where chrom peak data is subsequently evaluated or processed by sample, it is suggested to usebySample = TRUE
.chromPeakData()
gains a new parameterpeaks = character()
which allows to specify from which chromatographic peaks data should be returned. For these chromatographic peaks the ID (row name inchromPeaks()
) should be provided with thepeaks
parameter. This can reduce the memory requirement for cases in which only data of some selected chromatographic peaks needs to be extracted. Also,chromPeakData()
supports thebySample
parameter described forchromPeaks()
above.filterChromPeaks()
allows to filter the chromatographic peaks specifying which should be retainend using thekeep
parameter. This can be either alogical
,character
orinteger
vector. Duplicated or unsorted indices are not supported. Eventually present feature definitions will be updated as well. The function returns the object with the filtered chromatographic peaks.
Retention time alignment
adjustRtimePeakGroups()
andadjustRtime()
withPeakGroupsParam
: parameterextraPeaks
ofPeakGroupsParam
is ignored. Anchor peaks are thus only defined using theminFraction
and the optionalsubset
parameter.
Correspondence analysis results
featureDefinitions()
: similarly tofeatureDefinitions()
for XcmsExperiment objects, this method returns adata.frame
with the characteristics for the defined LC-MS features. The function forXcmsExperimentHdf5
does however not return the"peakidx"
column with the indices of the chromatographic peaks per feature. Also, the columns are returned in alphabetic order.featureValues()
: for parametervalue
, the optionvalue = "index"
(i.e. returning the index of the chromatographic peaks within thechromPeaks()
matrix per feature) is not supported.filterFeatureDefinitions()
: filter the feature definitions keeping only the specified features. Parameterfeatures
can be used to define the features to retain. It supports alogical
,integer
indices orcharacter
with the IDs of the features (i.e., their row names infeatureDefinitions()
). The function returns the inputXcmsExperimentHdf5
with the filtered content.
Examples
## Create a MsExperiment object representing the data from an LC-MS
## experiment.
library(MsExperiment)
## Define the raw data files
fls <- c(system.file('cdf/KO/ko15.CDF', package = "faahKO"),
system.file('cdf/KO/ko16.CDF', package = "faahKO"),
system.file('cdf/KO/ko18.CDF', package = "faahKO"))
## Define a data frame with the sample characterization
df <- data.frame(mzML_file = basename(fls),
sample = c("ko15", "ko16", "ko18"))
## Importe the data. This will initialize a `Spectra` object representing
## the raw data and assign these to the individual samples.
mse <- readMsExperiment(spectraFiles = fls, sampleData = df)
## Perform chromatographic peak detection storing the data in an HDF5 file
## Parameter `hdf5File` has to be provided and needs to be the path and
## name of a (not yet existing) file to which results are going to be
## stored. For the example below we use a temporary file.
xmse <- findChromPeaks(mse, param = CentWaveParam(prefilter = c(4, 100000)),
hdf5File = tempfile())
xmse
#> Object of class XcmsExperimentHdf5
#> Spectra: MS1 (3834)
#> Experiment data: 3 sample(s)
#> Sample data links:
#> - spectra: 3 sample(s) to 3834 element(s).
#> xcms results:
#> - chromatographic peaks in MS level(s): 1
#> results storage file:
#> /tmp/RtmpoSkzhK/file2c47282dfc4
## Extract selected columnds from the chromatographic peak detection
## results
chromPeaks(xmse, columns = c("rt", "mz", "into")) |> head()
#> rt mz into sample
#> CP1S1000001 2682.913 360 5641322 1
#> CP1S1000002 2679.783 344 5210016 1
#> CP1S1000003 2678.218 343 24147443 1
#> CP1S1000004 2679.783 365 14975761 1
#> CP1S1000005 2659.438 365 3520591 1
#> CP1S1000006 2784.635 280 2537599 1
## Extract the results per sample
res <- chromPeaks(xmse, columns = c("rt", "mz", "into"), bySample = TRUE)
## The chromatographic peaks of the second sample:
res[[2]] |> head()
#> rt mz into
#> CP1S2000001 2686.042 360 10248211
#> CP1S2000002 2686.042 344 5700652
#> CP1S2000003 2686.042 343 26229546
#> CP1S2000004 2596.840 365 2358688
#> CP1S2000005 2686.042 365 15565868
#> CP1S2000006 2797.154 279 10916521
## Convert the result object to the in-memory representation:
xmse_mem <- toXcmsExperiment(xmse)
xmse_mem
#> Object of class XcmsExperiment
#> Spectra: MS1 (3834)
#> Experiment data: 3 sample(s)
#> Sample data links:
#> - spectra: 3 sample(s) to 3834 element(s).
#> xcms results:
#> - chromatographic peaks: 181 in MS level(s): 1