State Key Laboratory of Southwestern Chinese Medicine Resources, Chengdu University of Traditional Chinese Medicine, Chengdu 611137, China.
School of Pharmacy, Chengdu University of Traditional Chinese Medicine, Chengdu, Sichuan 611137, China.
Anal Chem. 2023 Feb 14;95(6):3195-3203. doi: 10.1021/acs.analchem.2c03323. Epub 2023 Feb 2.
Two-dimensional (2D) H-C heteronuclear single quantum coherence (HSQC) has been increasingly applied to metabolomics studies because it can greatly improve the resolving capability compared with one-dimensional (1D) H NMR. However, preprocessing methods such as peak matching and alignment tools for 2D NMR-based metabolomics have lagged behind similar methods for 1D H NMR-based metabolomics. Correct matching and alignment of 2D NMR spectral features across multiple samples are particularly important for subsequent multivariate data analysis. Considering different intensity dynamic ranges of a variety of metabolites and the chemical shift variation across the spectra of multiple samples, here, we developed an efficient peak matching and alignment algorithm for 2D H-C HSQC-based metabolomics, called global intensity-guided peak matching and alignment (GIPMA). In GIPMA, peaks identified in all spectra are pooled together and sorted by intensity. Chemical shift of a stronger peak is regarded to be more accurate and reliable than that of a weaker peak. The strongest undesignated peak is chosen as the reference of a new cluster if it is not located within the chemical shift tolerance of any existing peak cluster (PC), or otherwise it is matched to an existing PC and the aligned chemical shift of the PC is updated as the intensity-weighted average of the chemical shifts of all peaks in the cluster. Setting an optimum chemical shift tolerance (Δδ) is critical for the peak matching and alignment across multiple samples. GIPMA dynamically searches for and intelligently selects the Δδ for peak matching to maximize the number of valid peak clusters (vPC), that is, spectral features, among multiple samples. By GIPMA, fully automatic peakwise matching and alignment do not require any spectrum as initial reference, while the chemical shift of each PC is updated as the intensity-weighted average of the chemical shifts of all peaks in the same PC, which is warranted to be statistically more accurate. Accurate chemical shifts for each representative spectral feature will facilitate subsequent peak assignment and are essential for correct metabolite identification and result interpretation. The proposed method was demonstrated successfully on the spectra of six model mixtures consisting of seven typical metabolites, yielding correct matching of all known spectral features. The performance of GIPMA was also demonstrated on 2D H-C HSQC spectra of 87 real extracts of 29 samples of five species. Hierarchical cluster analysis (HCA) and principal component analysis (PCA) of the 87 matched and aligned spectra by GIPMA generates correct classification of the 29 samples into five groups. In summary, the proposed algorithm of GIPMA provided a practical peak matching and alignment method to facilitate 2D NMR-based metabolomics studies.
二维(2D)H-C 异核单量子相干(HSQC)在代谢组学研究中得到了越来越多的应用,因为它可以大大提高分辨率,与一维(1D)H NMR 相比。然而,2D NMR 代谢组学的预处理方法,如峰匹配和对齐工具,落后于类似的 1D H NMR 代谢组学方法。正确匹配和对齐多个样本中的 2D NMR 光谱特征对于后续的多元数据分析尤为重要。考虑到各种代谢物的不同强度动态范围和多个样本光谱的化学位移变化,我们开发了一种用于 2D H-C HSQC 代谢组学的高效峰匹配和对齐算法,称为全局强度引导峰匹配和对齐(GIPMA)。在 GIPMA 中,所有谱中鉴定的峰被汇集在一起并按强度排序。强峰的化学位移被认为比弱峰更准确和可靠。如果最强的未指定峰不在任何现有峰簇(PC)的化学位移容限内,则选择该峰作为新簇的参考,否则将其与现有 PC 匹配,并更新 PC 的对齐化学位移作为簇中所有峰的化学位移的加权平均值。设置最佳化学位移容限(Δδ)对于多个样本的峰匹配和对齐至关重要。GIPMA 动态搜索并智能选择峰匹配的 Δδ,以最大化多个样本中有效峰簇(vPC)的数量,即光谱特征。通过 GIPMA,完全自动的峰对峰匹配和对齐不需要任何谱作为初始参考,而每个 PC 的化学位移更新为同一 PC 中所有峰的化学位移的加权平均值,这在统计学上更准确。每个代表性光谱特征的准确化学位移将有助于后续的峰分配,对于正确的代谢物识别和结果解释至关重要。该方法在由七个典型代谢物组成的六个模型混合物的光谱上得到了成功验证,得到了所有已知光谱特征的正确匹配。GIPMA 的性能也在五个物种的 29 个样本的 87 个实际提取物的 2D H-C HSQC 光谱上得到了验证。通过 GIPMA 匹配和对齐的 87 个光谱的层次聚类分析(HCA)和主成分分析(PCA)将 29 个样本正确分类为五个组。总之,所提出的 GIPMA 算法提供了一种实用的峰匹配和对齐方法,以促进 2D NMR 代谢组学研究。