Alm Erik, Torgrip Ralf J O, Aberg K Magnus, Schuppe-Koistinen Ina, Lindberg Johan
Dept. of Analytical Chemistry, BioSysteMetrics Group, Stockholm University, 106 91 Stockholm, Sweden.
Anal Bioanal Chem. 2009 Sep;395(1):213-23. doi: 10.1007/s00216-009-2940-4. Epub 2009 Jul 22.
This paper approaches the problem of intersample peak correspondence in the context of later applying statistical data analysis techniques to 1D 1H-nuclear magnetic resonance (NMR) data. Any data analysis methodology will fail to produce meaningful results if the analyzed data table is not synchronized, i.e., each analyzed variable frequency (Hz) does not originate from the same chemical source throughout the entire dataset. This is typically the case when dealing with NMR data from biological samples. In this paper, we present a new state of the art for solving this problem using the generalized fuzzy Hough transform (GFHT). This paper describes significant improvements since the method was introduced for NMR datasets of plasma in Csenki et al. (Anal Bioanal Chem 389:875-885, 15) and is now capable of synchronizing peaks from more complex datasets such as urine as well as plasma data. We present a novel way of globally modeling peak shifts using principal component analysis, a new algorithm for calculating the transform and an effective peak detection algorithm. The algorithm is applied to two real metabonomic 1H-NMR datasets and the properties of the method are compared to bucketing. We implicitly prove that GFHT establishes the objectively true correspondence. Desirable features of the GFHT are: (1) intersample peak correspondence even if peaks change order on the frequency axis and (2) the method is symmetric with respect to the samples.
本文在后续将统计数据分析技术应用于一维氢核磁共振(NMR)数据的背景下,探讨了采样间峰对应问题。如果所分析的数据表未同步,即整个数据集中每个被分析变量的频率(Hz)并非源自同一化学源,那么任何数据分析方法都无法产生有意义的结果。处理生物样本的NMR数据时通常就是这种情况。在本文中,我们提出了一种使用广义模糊霍夫变换(GFHT)解决此问题的最新方法。自该方法在Csenki等人(《分析与生物分析化学》389:875 - 885,2011年)针对血浆的NMR数据集引入以来,本文描述了其显著改进,现在该方法能够同步来自更复杂数据集(如尿液以及血浆数据)的峰。我们提出了一种使用主成分分析全局建模峰漂移的新方法、一种计算变换的新算法以及一种有效的峰检测算法。该算法应用于两个真实的代谢组学氢核磁共振数据集,并将该方法的特性与分桶法进行了比较。我们隐含地证明了GFHT建立了客观真实的对应关系。GFHT的理想特性包括:(1)即使峰在频率轴上的顺序发生变化,也能实现采样间峰对应;(2)该方法相对于样本是对称的。