Chau Hoi Yan Katharine, Zhang Xinran, Ressom Habtom W
Department of Oncology, Lombardi Comprehensive Cancer Center, Georgetown University Medical Center, Washington, DC 20057, USA.
Metabolites. 2025 Feb 14;15(2):132. doi: 10.3390/metabo15020132.
Liquid chromatography coupled with mass spectrometry (LC-MS) is a commonly used platform for many metabolomics studies. However, metabolite annotation has been a major bottleneck in these studies in part due to the limited publicly available spectral libraries, which consist of tandem mass spectrometry (MS/MS) data acquired from just a fraction of known compounds. Application of deep learning methods is increasingly reported as an alternative to spectral matching due to their ability to map complex relationships between molecular fingerprints and mass spectrometric measurements. The objectives of this study are to investigate deep learning methods for molecular fingerprint based on MS/MS spectra and to rank putative metabolite IDs according to similarity of their known and predicted molecular fingerprints. : We trained three types of deep learning methods to model the relationships between molecular fingerprints and MS/MS spectra. Prior to training, various data processing steps, including scaling, binning, and filtering, were performed on MS/MS spectra obtained from National Institute of Standards and Technology (NIST), MassBank of North America (MoNA), and Human Metabolome Database (HMDB). Furthermore, selection of the most relevant / bins and molecular fingerprints was conducted. The trained deep learning models were evaluated on ranking putative metabolite IDs obtained from a compound database for the challenges in Critical Assessment of Small Molecule Identification (CASMI) 2016, CASMI 2017, and CASMI 2022 benchmark datasets. : Feature selection methods effectively reduced redundant molecular and spectral features prior to model training. Deep learning methods trained with the truncated features have shown comparable performances against CSI:FingerID on ranking putative metabolite IDs. : The results demonstrate a promising potential of deep learning methods for metabolite annotation.
液相色谱-质谱联用(LC-MS)是许多代谢组学研究中常用的平台。然而,代谢物注释一直是这些研究中的主要瓶颈,部分原因是公开可用的光谱库有限,这些光谱库仅包含从一小部分已知化合物获取的串联质谱(MS/MS)数据。由于深度学习方法能够绘制分子指纹与质谱测量之间的复杂关系,越来越多的研究报告将其作为光谱匹配的替代方法。本研究的目的是研究基于MS/MS光谱的深度学习分子指纹方法,并根据已知和预测分子指纹的相似性对假定的代谢物ID进行排序。我们训练了三种类型的深度学习方法来模拟分子指纹与MS/MS光谱之间的关系。在训练之前,对从美国国家标准与技术研究院(NIST)、北美质谱库(MoNA)和人类代谢组数据库(HMDB)获得的MS/MS光谱进行了各种数据处理步骤,包括缩放、装箱和过滤。此外,还进行了最相关/箱和分子指纹的选择。在对从小分子鉴定关键评估(CASMI)2016、CASMI 2017和CASMI 2022基准数据集的化合物数据库中获得的假定代谢物ID进行排序时,对训练好的深度学习模型进行了评估。特征选择方法在模型训练之前有效地减少了冗余的分子和光谱特征。用截断特征训练的深度学习方法在对假定代谢物ID进行排序时,表现出与CSI:FingerID相当的性能。结果表明,深度学习方法在代谢物注释方面具有广阔的应用前景。