Create report of analyte differences

Create a report showing the most significant differences between two sets of samples. Optionally create extracted ion chromatograms for the most significant differences.

Methods

object = "xcmsSet": diffreport(object, class1 = levels(sampclass(object))[1], class2 = levels(sampclass(object))[2], filebase = character(), eicmax = 0, eicwidth = 200, sortpval = TRUE, classeic = c(class1,class2), value=c("into","maxo","intb"), metlin = FALSE, h=480,w=640, mzdec=2, missing = numeric(), ...)

Arguments

object: the xcmsSet object
class1: character vector with the first set of sample classes to be compared
class2: character vector with the second set of sample classes to be compared
filebase: base file name to save report, .tsv file and _eic will be appended to this name for the tabular report and EIC directory, respectively. if blank nothing will be saved
eicmax: number of the most significantly different analytes to create EICs for
eicwidth: width (in seconds) of EICs produced
sortpval: logical indicating whether the reports should be sorted by p-value
classeic: character vector with the sample classes to include in the EICs
value: intensity values to be used for the diffreport.
If value="into", integrated peak intensities are used.
If value="maxo", maximum peak intensities are used.
If value="intb", baseline corrected integrated peak intensities are used (only available if peak detection was done by findPeaks.centWave).
metlin: mass uncertainty to use for generating link to Metlin metabolite database. the sign of the uncertainty indicates negative or positive mode data for M+H or M-H calculation. a value of FALSE or 0 removes the column
h: Numeric variable for the height of the eic and boxplots that are printed out.
w: Numeric variable for the width of the eic and boxplots print out made.
mzdec: Number of decimal places of title m/z values in the eic plot.
missing: numeric(1) defining an optional value for missing values. missing = 0 would e.g. replace all NA values in the feature matrix with 0. Note that also a call to fillPeaks results in a feature matrix in which NA values are replaced by 0.
...: optional arguments to be passed to mt.teststat from the multtest package.

Details

This method handles creation of summary reports with statistics about which analytes were most significantly different between two sets of samples. It computes Welch's two-sample t-statistic for each analyte and ranks them by p-value. It returns a summary report that can optionally be written out to a tab-separated file.

Additionally, it does all the heavy lifting involved in creating superimposed extracted ion chromatograms for a given number of analytes. It does so by reading the raw data files associated with the samples of interest one at a time. As it does so, it prints the name of the sample it is currently reading. Depending on the number and size of the samples, this process can take a long time.

If a base file name is provided, the report (see Value section) will be saved to a tab separated file. If EICs are generated, they will be saved as 640x480 PNG files in a newly created subdirectory. However this parameter can be changed with the commands arguments. The numbered file names correspond to the rows in the report.

Chromatographic traces in the EICs are colored and labeled by their sample class. Sample classes take their color from the current palette. The color a sample class is assigned is dependent its order in the xcmsSet object, not the order given in the class arguments. Thus levels(sampclass(object))[1] would use color palette()[1] and so on. In that way, sample classes maintain the same color across any number of different generated reports.

When there are multiple sample classes, xcms will produce boxplots of the different classes and will generate a single anova p-value statistic. Like the eic's the plot number corresponds to the row number in the report.

Value

A data frame with the following columns:

fold: mean fold change (always greater than 1, see tstat for which set of sample classes was higher)
tstat: Welch's two sample t-statistic, positive for analytes having greater intensity in class2, negative for analytes having greater intensity in class1
pvalue: p-value of t-statistic
anova: p-value of the anova statistic if there are multiple classes
mzmed: median m/z of peaks in the group
mzmin: minimum m/z of peaks in the group
mzmax: maximum m/z of peaks in the group
rtmed: median retention time of peaks in the group
rtmin: minimum retention time of peaks in the group
rtmax: maximum retention time of peaks in the group
npeaks: number of peaks assigned to the group
Sample Classes: number samples from each sample class represented in the group
metlin: A URL to metlin for that mass
...: one column for every sample class
Sample Names: integrated intensity value for every sample
...: one column for every sample

Methods

Arguments

Details

Value

See also