Create report of analyte differences
diffreport-methods.Rd
Create a report showing the most significant differences between two sets of samples. Optionally create extracted ion chromatograms for the most significant differences.
Methods
- object = "xcmsSet"
diffreport(object, class1 = levels(sampclass(object))[1], class2 = levels(sampclass(object))[2], filebase = character(), eicmax = 0, eicwidth = 200, sortpval = TRUE, classeic = c(class1,class2), value=c("into","maxo","intb"), metlin = FALSE, h=480,w=640, mzdec=2, missing = numeric(), ...)
Arguments
- object
the
xcmsSet
object- class1
character vector with the first set of sample classes to be compared
- class2
character vector with the second set of sample classes to be compared
- filebase
base file name to save report,
.tsv
file and_eic
will be appended to this name for the tabular report and EIC directory, respectively. if blank nothing will be saved- eicmax
number of the most significantly different analytes to create EICs for
- eicwidth
width (in seconds) of EICs produced
- sortpval
logical indicating whether the reports should be sorted by p-value
- classeic
character vector with the sample classes to include in the EICs
- value
intensity values to be used for the diffreport.
Ifvalue="into"
, integrated peak intensities are used.
Ifvalue="maxo"
, maximum peak intensities are used.
Ifvalue="intb"
, baseline corrected integrated peak intensities are used (only available if peak detection was done byfindPeaks.centWave
).- metlin
mass uncertainty to use for generating link to Metlin metabolite database. the sign of the uncertainty indicates negative or positive mode data for M+H or M-H calculation. a value of FALSE or 0 removes the column
- h
Numeric variable for the height of the eic and boxplots that are printed out.
- w
Numeric variable for the width of the eic and boxplots print out made.
- mzdec
Number of decimal places of title m/z values in the eic plot.
- missing
numeric(1)
defining an optional value for missing values.missing = 0
would e.g. replace allNA
values in the feature matrix with0
. Note that also a call tofillPeaks
results in a feature matrix in whichNA
values are replaced by0
.- ...
optional arguments to be passed to
mt.teststat
from themulttest
package.
Details
This method handles creation of summary reports with statistics about which analytes were most significantly different between two sets of samples. It computes Welch's two-sample t-statistic for each analyte and ranks them by p-value. It returns a summary report that can optionally be written out to a tab-separated file.
Additionally, it does all the heavy lifting involved in creating superimposed extracted ion chromatograms for a given number of analytes. It does so by reading the raw data files associated with the samples of interest one at a time. As it does so, it prints the name of the sample it is currently reading. Depending on the number and size of the samples, this process can take a long time.
If a base file name is provided, the report (see Value section) will be saved to a tab separated file. If EICs are generated, they will be saved as 640x480 PNG files in a newly created subdirectory. However this parameter can be changed with the commands arguments. The numbered file names correspond to the rows in the report.
Chromatographic traces in the EICs are colored and labeled by
their sample class. Sample classes take their color from the
current palette. The color a sample class is assigned is dependent
its order in the xcmsSet
object, not the order given in
the class arguments. Thus levels(sampclass(object))[1]
would use color palette()[1]
and so on. In that way, sample
classes maintain the same color across any number of different
generated reports.
When there are multiple sample classes, xcms will produce boxplots of the different classes and will generate a single anova p-value statistic. Like the eic's the plot number corresponds to the row number in the report.
Value
A data frame with the following columns:
- fold
mean fold change (always greater than 1, see
tstat
for which set of sample classes was higher)- tstat
Welch's two sample t-statistic, positive for analytes having greater intensity in
class2
, negative for analytes having greater intensity inclass1
- pvalue
p-value of t-statistic
- anova
p-value of the anova statistic if there are multiple classes
- mzmed
median m/z of peaks in the group
- mzmin
minimum m/z of peaks in the group
- mzmax
maximum m/z of peaks in the group
- rtmed
median retention time of peaks in the group
- rtmin
minimum retention time of peaks in the group
- rtmax
maximum retention time of peaks in the group
- npeaks
number of peaks assigned to the group
- Sample Classes
number samples from each sample class represented in the group
- metlin
A URL to metlin for that mass
- ...
one column for every sample class
- Sample Names
integrated intensity value for every sample
- ...
one column for every sample