Bui-Thi Danh, Liu Youzhong, Lippens Jennifer L, Laukens Kris, De Vijlder Thomas
Computer Science Department, University of Antwerp, Middelheimlaan 1, 2020, Antwerp, Belgium.
Therapeutic Development and Supply, Janssen Pharmaceutica N.V., Turnhoutseweg 30, 2340, Beerse, Belgium.
J Cheminform. 2024 May 28;16(1):61. doi: 10.1186/s13321-024-00858-5.
Small molecule identification is a crucial task in analytical chemistry and life sciences. One of the most commonly used technologies to elucidate small molecule structures is mass spectrometry. Spectral library search of product ion spectra (MS/MS) is a popular strategy to identify or find structural analogues. This approach relies on the assumption that spectral similarity and structural similarity are correlated. However, popular spectral similarity measures, usually calculated based on identical fragment matches between the MS/MS spectra, do not always accurately reflect the structural similarity. In this study, we propose TransExION, a Transformer based Explainable similarity metric for IONS. TransExION detects related fragments between MS/MS spectra through their mass difference and uses these to estimate spectral similarity. These related fragments can be nearly identical, but can also share a substructure. TransExION also provides a post-hoc explanation of its estimation, which can be used to support scientists in evaluating the spectral library search results and thus in structure elucidation of unknown molecules. Our model has a Transformer based architecture and it is trained on the data derived from GNPS MS/MS libraries. The experimental results show that it improves existing spectral similarity measures in searching and interpreting structural analogues as well as in molecular networking. SCIENTIFIC CONTRIBUTION: We propose a transformer-based spectral similarity metrics that improves the comparison of small molecule tandem mass spectra. We provide a post hoc explanation that can serve as a good starting point for unknown spectra annotation based on database spectra.
小分子鉴定是分析化学和生命科学中的一项关键任务。阐明小分子结构最常用的技术之一是质谱分析。对产物离子谱(MS/MS)进行谱库搜索是一种识别或寻找结构类似物的常用策略。这种方法基于谱相似性和结构相似性相关的假设。然而,通常基于MS/MS谱之间相同片段匹配计算的常用谱相似性度量并不总是能准确反映结构相似性。在本研究中,我们提出了TransExION,一种基于Transformer的离子可解释相似性度量。TransExION通过MS/MS谱之间的质量差检测相关片段,并利用这些片段估计谱相似性。这些相关片段可以几乎相同,但也可以共享一个子结构。TransExION还对其估计提供事后解释,可用于支持科学家评估谱库搜索结果,从而用于未知分子的结构阐明。我们的模型具有基于Transformer的架构,并在源自GNPS MS/MS库的数据上进行训练。实验结果表明,它在搜索和解释结构类似物以及分子网络方面改进了现有的谱相似性度量。科学贡献:我们提出了一种基于Transformer的谱相似性度量,改进了小分子串联质谱的比较。我们提供了一种事后解释,可作为基于数据库谱对未知谱进行注释的良好起点。