Wang Runzhong, Manjrekar Mrunali, Mahjour Babak, Avila-Pacheco Julian, Provenzano Joules, Reynolds Erin, Lederbauer Magdalena, Mashin Eivgeni, Goldman Samuel, Wang Mingxun, Weng Jing-Ke, Plata Desirée L, Clish Clary B, Coley Connor W
Massachusetts Institute of Technology, 77 Massachusetts Ave, Cambridge, 02139, MA, United States.
Broad Institute of MIT and Harvard, 415 Main St, Cambridge, 02142, MA, United States.
bioRxiv. 2025 Jun 1:2025.05.28.656653. doi: 10.1101/2025.05.28.656653.
Structural elucidation using untargeted tandem mass spectrometry (MS/MS) has played a critical role in advancing scientific discovery [1, 2]. However, differentiating molecular fragmentation patterns between isobaric structures remains a prominent challenge in metabolomics [3-10], drug discovery [11-13], and reaction screening [14-17], presenting a significant barrier to the cost-effective and rapid identification of unknown molecular structures. Here, we present a geometric deep learning model, ICEBERG, that simulates collision-induced dissociation in mass spectrometry to generate chemically plausible fragments and their relative intensities with awareness of collision energies and polarities. We utilize ICEBERG predictions to facilitate structure elucidation by ranking a set of candidate structures based on the similarity between their predicted MS/MS spectra and an experimental MS/MS spectrum of interest. This integrated elucidation pipeline enables state-of-the-art performance in compound annotation, with 40% top-1 accuracy on the NIST'20 [M+H] adduct subset and with 92% of correct structures appearing in the top ten predictions in the same dataset. We demonstrate several real-world case studies, including identifying clinical biomarkers of depression and tuberculous meningitis, annotating an aqueous abiotic degradation product of the pesticide thiophanate methyl, disambiguating isobaric products in pooled reaction screening, and annotating biosynthetic pathways in . Overall, this deep learning-based, chemically-interpretable paradigm for structural elucidation enables rapid molecular annotation from complex mixtures, driving discoveries across diverse scientific domains.
使用非靶向串联质谱(MS/MS)进行结构解析在推动科学发现方面发挥了关键作用[1,2]。然而,在代谢组学[3-10]、药物发现[11-13]和反应筛选[14-17]中,区分等压结构之间的分子碎片化模式仍然是一个突出的挑战,这对经济高效且快速地鉴定未知分子结构构成了重大障碍。在此,我们提出了一种几何深度学习模型ICEBERG,它模拟质谱中的碰撞诱导解离,以生成具有化学合理性的碎片及其相对强度,并考虑碰撞能量和极性。我们利用ICEBERG的预测结果,通过根据一组候选结构的预测MS/MS谱与感兴趣的实验MS/MS谱之间的相似性对它们进行排序,来促进结构解析。这种集成的解析流程在化合物注释方面实现了先进的性能,在NIST'20 [M+H]加合物子集上的 top-1准确率为40%,并且在同一数据集中92%的正确结构出现在前十的预测结果中。我们展示了几个实际案例研究,包括鉴定抑郁症和结核性脑膜炎的临床生物标志物、注释农药甲基托布津的水相非生物降解产物、在混合反应筛选中区分等压产物以及注释生物合成途径。总体而言,这种基于深度学习的、具有化学可解释性的结构解析范式能够从复杂混合物中快速进行分子注释,推动跨多个科学领域的发现。