Anal Chem. 2019 Dec 17;91(24):15686-15693. doi: 10.1021/acs.analchem.9b03849. Epub 2019 Dec 3.
Knowledge of the chemical identity of metabolite molecules is critical for the understanding of the complex biological systems to which they belong. Since metabolite identities and their concentrations are often directly linked to the phenotype, such information can be used to map biochemical pathways and understand their role in health and disease. A very large number of metabolites however are still unknown; i.e., their spectroscopic signatures do not match those in existing databases, suggesting unknown molecule identification is both imperative and challenging. Although metabolites are structurally highly diverse, the majority shares a rather limited number of structural motifs, which are defined by sets of H and C chemical shifts of the same spin system. This allows one to characterize unknown metabolites by a divide-and-conquer strategy that identifies their structural motifs first. Here, we present the structural motif-based approach "SUMMIT Motif" for the de novo identification of unknown molecular structures in complex mixtures, without the need for extensive purification, using NMR in tandem with two newly curated NMR molecular structural motif metabolomics databases (MSMMDBs). For the identification of structural motif(s), first, the H and C chemical shifts of all the individual spin systems are extracted from 2D and 3D NMR spectra of the complex mixture. Next, the molecular structural motifs are identified by querying these chemical shifts against the new MSMMDBs. One database, COLMAR MSMMDB, was derived from experimental NMR chemical shifts of known metabolites taken from the COLMAR metabolomics database, while the other MSMMDB, pNMR MSMMDB, is based on predicted chemical shifts of metabolites of several existing large metabolomics databases. For molecules consisting of multiple spin systems, spin systems are connected via long-range scalar J-couplings. When this motif-based identification method was applied to the hydrophilic extract of mouse bile fluid, two unknown metabolites could be successfully identified. This approach is both accurate and efficient for the identification of unknown metabolites and hence enables the discovery of new biochemical processes and potential biomarkers.
代谢物分子的化学结构知识对于理解其所属的复杂生物系统至关重要。由于代谢物的身份及其浓度通常与表型直接相关,因此这些信息可用于绘制生化途径并了解其在健康和疾病中的作用。然而,仍有大量代谢物尚未被发现;也就是说,它们的光谱特征与现有数据库中的特征不匹配,这表明未知分子的鉴定不仅必要,而且具有挑战性。尽管代谢物在结构上具有高度多样性,但它们大多数都具有相当有限数量的结构基序,这些结构基序由同一自旋系统的 H 和 C 化学位移的集合定义。这使得我们可以采用分而治之的策略,首先确定未知代谢物的结构基序,从而对其进行特征描述。在这里,我们提出了一种基于结构基序的方法“ SUMMIT Motif”,用于在无需广泛纯化的情况下,通过 NMR 与两个新编纂的 NMR 分子结构基序代谢组学数据库(MSMMDB)相结合,对复杂混合物中的未知分子结构进行从头鉴定。为了识别结构基序,首先从复杂混合物的 2D 和 3D NMR 谱中提取所有单个自旋系统的 H 和 C 化学位移。接下来,通过将这些化学位移与新的 MSMMDB 进行查询,确定分子结构基序。其中一个数据库 COLMAR MSMMDB 是从 COLMAR 代谢组学数据库中的已知代谢物的实验 NMR 化学位移中得出的,而另一个数据库 pNMR MSMMDB 则是基于几个现有大型代谢组学数据库中代谢物的预测化学位移得出的。对于由多个自旋系统组成的分子,自旋系统通过远程标量 J 耦合连接。当将这种基于基序的识别方法应用于小鼠胆汁液的亲水性提取物时,可以成功鉴定出两种未知代谢物。这种方法对于鉴定未知代谢物既准确又高效,因此能够发现新的生化过程和潜在的生物标志物。