Department of Analytical Chemistry, BioSysteMetrics Group, Stockholm University, Stockholm, Sweden.
Anal Bioanal Chem. 2012 Apr;403(2):443-55. doi: 10.1007/s00216-012-5789-x. Epub 2012 Feb 24.
In (1)H NMR metabolomic datasets, there are often over a thousand peaks per spectrum, many of which change position drastically between samples. Automatic alignment, annotation, and quantification of all the metabolites of interest in such datasets have not been feasible. In this work we propose a fully automated annotation and quantification procedure which requires annotation of metabolites only in a single spectrum. The reference database built from that single spectrum can be used for any number of (1)H NMR datasets with a similar matrix. The procedure is based on the generalized fuzzy Hough transform (GFHT) for alignment and on Principal-components analysis (PCA) for peak selection and quantification. We show that we can establish quantities of 21 metabolites in several (1)H NMR datasets and that the procedure is extendable to include any number of metabolites that can be identified in a single spectrum. The procedure speeds up the quantification of previously known metabolites and also returns a table containing the intensities and locations of all the peaks that were found and aligned but not assigned to a known metabolite. This enables both biopattern analysis of known metabolites and data mining for new potential biomarkers among the unknowns.
在(1)H NMR 代谢组学数据集中,每个光谱通常有超过一千个峰,其中许多峰在样品之间的位置变化很大。在这样的数据集自动对齐、注释和定量所有感兴趣的代谢物还不可行。在这项工作中,我们提出了一种完全自动化的注释和定量程序,该程序仅需要在单个光谱中注释代谢物。从该单光谱构建的参考数据库可用于具有类似矩阵的任意数量的(1)H NMR 数据集。该过程基于广义模糊霍夫变换(GFHT)进行对齐,基于主成分分析(PCA)进行峰选择和定量。我们表明,我们可以在几个(1)H NMR 数据集建立 21 种代谢物的数量,并且该过程可以扩展到包括可以在单个光谱中识别的任意数量的代谢物。该过程加快了先前已知代谢物的定量速度,并返回一个包含已发现和对齐但未分配给已知代谢物的所有峰的强度和位置的表。这既可以对已知代谢物进行生物模式分析,也可以对未知代谢物中的新潜在生物标志物进行数据挖掘。