Department of Mathematics and Computer Science, University of Antwerp, Belgium.
BMC Bioinformatics. 2011 Oct 20;12:405. doi: 10.1186/1471-2105-12-405.
Nuclear magnetic resonance spectroscopy (NMR) is a powerful technique to reveal and compare quantitative metabolic profiles of biological tissues. However, chemical and physical sample variations make the analysis of the data challenging, and typically require the application of a number of preprocessing steps prior to data interpretation. For example, noise reduction, normalization, baseline correction, peak picking, spectrum alignment and statistical analysis are indispensable components in any NMR analysis pipeline.
We introduce a novel suite of informatics tools for the quantitative analysis of NMR metabolomic profile data. The core of the processing cascade is a novel peak alignment algorithm, called hierarchical Cluster-based Peak Alignment (CluPA). The algorithm aligns a target spectrum to the reference spectrum in a top-down fashion by building a hierarchical cluster tree from peak lists of reference and target spectra and then dividing the spectra into smaller segments based on the most distant clusters of the tree. To reduce the computational time to estimate the spectral misalignment, the method makes use of Fast Fourier Transformation (FFT) cross-correlation. Since the method returns a high-quality alignment, we can propose a simple methodology to study the variability of the NMR spectra. For each aligned NMR data point the ratio of the between-group and within-group sum of squares (BW-ratio) is calculated to quantify the difference in variability between and within predefined groups of NMR spectra. This differential analysis is related to the calculation of the F-statistic or a one-way ANOVA, but without distributional assumptions. Statistical inference based on the BW-ratio is achieved by bootstrapping the null distribution from the experimental data.
The workflow performance was evaluated using a previously published dataset. Correlation maps, spectral and grey scale plots show clear improvements in comparison to other methods, and the down-to-earth quantitative analysis works well for the CluPA-aligned spectra. The whole workflow is embedded into a modular and statistically sound framework that is implemented as an R package called "speaq" ("spectrum alignment and quantitation"), which is freely available from http://code.google.com/p/speaq/.
核磁共振波谱(NMR)是一种强大的技术,可以揭示和比较生物组织的定量代谢谱。然而,化学和物理样本的变化使得数据分析具有挑战性,通常需要在数据解释之前应用许多预处理步骤。例如,降噪、归一化、基线校正、峰提取、谱对齐和统计分析是任何 NMR 分析管道中不可或缺的组成部分。
我们引入了一套新的用于定量分析 NMR 代谢组学图谱数据的信息学工具。处理级联的核心是一种新的峰对齐算法,称为基于层次聚类的峰对齐(CluPA)。该算法通过从参考和目标光谱的峰列表构建层次聚类树,并根据树的最远聚类将光谱分成更小的片段,以自上而下的方式将目标光谱与参考光谱对齐。为了减少估计光谱错位的计算时间,该方法利用快速傅里叶变换(FFT)互相关。由于该方法返回高质量的对齐,因此我们可以提出一种简单的方法来研究 NMR 光谱的可变性。对于每个对齐的 NMR 数据点,计算组间和组内平方和的比率(BW-ratio),以量化预定义 NMR 光谱组之间和组内的可变性差异。这种差异分析与 F 统计量或单向方差分析的计算相关,但没有分布假设。基于 BW-ratio 的统计推断是通过从实验数据中引导 null 分布来实现的。
使用以前发表的数据集评估了工作流程的性能。相关图、光谱和灰度图显示与其他方法相比有明显的改进,并且基于 CluPA 对齐的光谱的实用定量分析效果很好。整个工作流程嵌入到一个模块化和统计上合理的框架中,该框架实现为一个名为“speaq”(“光谱对齐和定量”)的 R 包,并可从 http://code.google.com/p/speaq/ 免费获得。