Baran Richard, Kochi Hayataro, Saito Natsumi, Suematsu Makoto, Soga Tomoyoshi, Nishioka Takaaki, Robert Martin, Tomita Masaru
Institute for Advanced Biosciences, Keio University, Tsuruoka, Yamagata 997-0017, Japan.
BMC Bioinformatics. 2006 Dec 13;7:530. doi: 10.1186/1471-2105-7-530.
With the advent of metabolomics as a powerful tool for both functional and biomarker discovery, the identification of specific differences between complex metabolite profiles is becoming a major challenge in the data analysis pipeline. The task remains difficult, given the datasets' size, complexity, and common shifts in migration (elution/retention) times between samples analyzed by hyphenated mass spectrometry methods.
We present a Mathematica (Wolfram Research, Inc.) package MathDAMP (Mathematica package for Differential Analysis of Metabolite Profiles), which highlights differences between raw datasets acquired by hyphenated mass spectrometry methods by applying arithmetic operations to all corresponding signal intensities on a datapoint-by-datapoint basis. Peak identification and integration is thus bypassed and the results are displayed graphically. To facilitate direct comparisons, the raw datasets are automatically preprocessed and normalized in terms of both migration times and signal intensities. A combination of dynamic programming and global optimization is used for the alignment of the datasets along the migration time dimension. The processed datasets and the results of direct comparisons between them are visualized using density plots (axes represent migration time and m/z values while peaks appear as color-coded spots) providing an intuitive overall view. Various forms of comparisons and statistical tests can be applied to highlight subtle differences. Overlaid electropherograms (chromatograms) corresponding to the vicinities of the candidate differences from any result may be generated in a descending order of significance for visual confirmation. Additionally, a standard library table (a list of m/z values and migration times for known compounds) may be aligned and overlaid on the plots to allow easier identification of metabolites.
Our tool facilitates the visualization and identification of differences between complex metabolite profiles according to various criteria in an automated fashion and is useful for data-driven discovery of biomarkers and functional genomics.
随着代谢组学作为一种用于功能发现和生物标志物发现的强大工具的出现,识别复杂代谢物谱之间的特定差异正成为数据分析流程中的一项重大挑战。鉴于数据集的规模、复杂性以及通过联用质谱法分析的样本之间迁移(洗脱/保留)时间的常见变化,该任务仍然困难重重。
我们展示了一个Mathematica(Wolfram Research公司)软件包MathDAMP(用于代谢物谱差异分析的Mathematica软件包),它通过在逐个数据点的基础上对所有相应信号强度进行算术运算,突出了通过联用质谱法获取的原始数据集之间的差异。因此绕过了峰识别和积分过程,并以图形方式显示结果。为便于直接比较,原始数据集会根据迁移时间和信号强度自动进行预处理和归一化。动态规划和全局优化相结合用于沿迁移时间维度对数据集进行比对。使用密度图(轴代表迁移时间和质荷比,峰以颜色编码的点出现)可视化处理后的数据集及其之间直接比较的结果,提供直观的总体视图。可以应用各种形式的比较和统计测试来突出细微差异。可以按显著性降序生成与任何结果中候选差异附近对应的叠加电泳图(色谱图),以便进行视觉确认。此外,可以将标准库表(已知化合物的质荷比和迁移时间列表)进行比对并叠加在图上,以便更轻松地识别代谢物。
我们的工具便于以自动化方式根据各种标准可视化和识别复杂代谢物谱之间的差异,对于基于数据驱动的生物标志物发现和功能基因组学很有用。