Chen Bin, Li Hailiang, Huang Rongfu, Tang Yanan, Li Feng
Key Laboratory of Green Chemistry & Technology of Ministry of Education, College of Chemistry, Sichuan University, Chengdu, Sichuan, 610064, China.
Sichuan Provincial Key Laboratory of Universities on Environmental Science and Engineering, MOE Key Laboratory of Deep Earth Science and Engineering, College of Architecture and Environment, Sichuan University, Chengdu, Sichuan, 610064, China.
Nat Commun. 2024 Sep 27;15(1):8396. doi: 10.1038/s41467-024-52805-5.
Chemical derivatization is a powerful strategy to enhance sensitivity and selectivity of liquid chromatography-mass spectrometry for non-targeted analysis of chemicals in complex mixtures. However, it remains impossible to obtain large sets of reference spectra for chemically derived molecules (CDMs), representing a major barrier in real-world applications. Herein, we describe a deep learning approach that enables accurate prediction of electrospray ionization tandem mass spectra for CDMs (DeepCDM). DeepCDM is established by transfer learning from a generic spectrum predicting model using a small set of experimentally acquired tandem mass spectra of CDMs, which converts a generic model with low predictability for CDMs into a specialized model with high predictability. We demonstrate DeepCDM by predicting electrospray ionization tandem mass spectra of dansylated molecules. The success in establishing Dns-MS further enables the development of DnsBank, a dansylation-specialized in silico spectral library. DnsBank achieves significant increases of accurate annotation rates of dansylated molecules, facilitating discovery of new hazardous pollutants from an environmental study of leather industrial wastewater. DeepCDM is also highly versatile for other classes of CDMs. Therefore, we envision that DeepCDM will pave a way for high-throughput identification of CDMs in non-targeted analysis to dig unknowns with potential health impacts from emerging anthropogenic chemicals.
化学衍生化是一种强大的策略,可提高液相色谱-质谱联用技术对复杂混合物中化学物质进行非靶向分析的灵敏度和选择性。然而,对于化学衍生分子(CDM)而言,获取大量参考光谱仍然是不可能的,这是实际应用中的一个主要障碍。在此,我们描述了一种深度学习方法,该方法能够准确预测CDM的电喷雾电离串联质谱(DeepCDM)。DeepCDM是通过使用一小部分实验获得的CDM串联质谱,从通用光谱预测模型进行迁移学习而建立的,它将对CDM预测性较低的通用模型转化为具有高预测性的专用模型。我们通过预测丹磺酰化分子的电喷雾电离串联质谱来展示DeepCDM。成功建立丹磺酰化质谱(Dns-MS)进一步推动了丹磺酰化专用虚拟光谱库DnsBank的开发。DnsBank显著提高了丹磺酰化分子的准确注释率,有助于从皮革工业废水的环境研究中发现新的有害污染物。DeepCDM对其他类型的CDM也具有高度通用性。因此,我们设想DeepCDM将为非靶向分析中CDM的高通量鉴定铺平道路,以挖掘来自新兴人为化学物质的具有潜在健康影响的未知物质。