Parsons Helen M, Ludwig Christian, Günther Ulrich L, Viant Mark R
Centre for Systems Biology, The University of Birmingham, Edgbaston, Birmingham, UK.
BMC Bioinformatics. 2007 Jul 2;8:234. doi: 10.1186/1471-2105-8-234.
Classifying nuclear magnetic resonance (NMR) spectra is a crucial step in many metabolomics experiments. Since several multivariate classification techniques depend upon the variance of the data, it is important to first minimise any contribution from unwanted technical variance arising from sample preparation and analytical measurements, and thereby maximise any contribution from wanted biological variance between different classes. The generalised logarithm (glog) transform was developed to stabilise the variance in DNA microarray datasets, but has rarely been applied to metabolomics data. In particular, it has not been rigorously evaluated against other scaling techniques used in metabolomics, nor tested on all forms of NMR spectra including 1-dimensional (1D) 1H, projections of 2D 1H, 1H J-resolved (pJRES), and intact 2D J-resolved (JRES).
Here, the effects of the glog transform are compared against two commonly used variance stabilising techniques, autoscaling and Pareto scaling, as well as unscaled data. The four methods are evaluated in terms of the effects on the variance of NMR metabolomics data and on the classification accuracy following multivariate analysis, the latter achieved using principal component analysis followed by linear discriminant analysis. For two of three datasets analysed, classification accuracies were highest following glog transformation: 100% accuracy for discriminating 1D NMR spectra of hypoxic and normoxic invertebrate muscle, and 100% accuracy for discriminating 2D JRES spectra of fish livers sampled from two rivers. For the third dataset, pJRES spectra of urine from two breeds of dog, the glog transform and autoscaling achieved equal highest accuracies. Additionally we extended the glog algorithm to effectively suppress noise, which proved critical for the analysis of 2D JRES spectra.
We have demonstrated that the glog and extended glog transforms stabilise the technical variance in NMR metabolomics datasets. This significantly improves the discrimination between sample classes and has resulted in higher classification accuracies compared to unscaled, autoscaled or Pareto scaled data. Additionally we have confirmed the broad applicability of the glog approach using three disparate datasets from different biological samples using 1D NMR spectra, 1D projections of 2D JRES spectra, and intact 2D JRES spectra.
在许多代谢组学实验中,对核磁共振(NMR)光谱进行分类是关键步骤。由于多种多元分类技术依赖于数据的方差,因此首先将样本制备和分析测量中产生的不必要技术方差的贡献降至最低,并从而将不同类别之间所需生物方差的贡献最大化,这一点很重要。广义对数(glog)变换是为了稳定DNA微阵列数据集中的方差而开发的,但很少应用于代谢组学数据。特别是,它尚未与代谢组学中使用的其他缩放技术进行严格评估,也未在包括一维(1D)1H、二维1H投影、1H J分辨(pJRES)和完整二维J分辨(JRES)在内的所有形式的NMR光谱上进行测试。
在此,将glog变换的效果与两种常用的方差稳定技术(自动缩放和帕累托缩放)以及未缩放数据进行了比较。根据对NMR代谢组学数据方差的影响以及多变量分析后的分类准确性对这四种方法进行了评估,后者通过主成分分析然后进行线性判别分析来实现。对于分析的三个数据集中的两个,glog变换后的分类准确率最高:区分缺氧和常氧无脊椎动物肌肉的1D NMR光谱的准确率为100%,区分从两条河流采集的鱼肝的二维JRES光谱的准确率为100%。对于第三个数据集,两种犬类尿液的pJRES光谱,glog变换和自动缩放达到了相同的最高准确率。此外,我们扩展了glog算法以有效抑制噪声,这被证明对二维JRES光谱的分析至关重要。
我们已经证明,glog和扩展的glog变换稳定了NMR代谢组学数据集中的技术方差。与未缩放、自动缩放或帕累托缩放的数据相比,这显著提高了样本类别之间的区分度,并导致了更高的分类准确率。此外,我们使用来自不同生物样本的三个不同数据集(使用1D NMR光谱、二维JRES光谱的1D投影和完整的二维JRES光谱)证实了glog方法的广泛适用性。