Strobel Michael, Gil-de-la-Fuente Alberto, Zare Shahneh Mohammad Reza, Abiead Yasin El, Bushuiev Roman, Bushuiev Anton, Pluskal Tomáš, Wang Mingxun
Department of Computer Science and Engineering, University of California Riverside, 900 University Ave., Riverside, CA, 92521, USA.
Information Technologies Department, Escuela Politécnica Superior, Universidad San Pablo-CEU, CEU Universities, Urbanización Montepríncipe, Boadilla Del monte, 28668, Madrid, Spain.
BMC Bioinformatics. 2025 Jul 11;26(1):174. doi: 10.1186/s12859-025-06194-1.
Untargeted tandem mass spectrometry serves as a scalable solution for the organization of small molecules. One of the most prevalent techniques for analyzing the acquired tandem mass spectrometry data (MS/MS) - called molecular networking - organizes and visualizes putatively structurally related compounds. However, a key bottleneck of this approach is the comparison of MS/MS spectra used to identify nearby structural neighbors. Machine learning (ML) approaches have emerged as a promising technique to predict structural similarity from MS/MS that may surpass the current state-of-the-art algorithmic methods. However, the comparison between these different ML methods remains a challenge because there is a lack of standardization to benchmark, evaluate, and compare MS/MS similarity methods, and there are no methods that address data leakage between training and test data in order to analyze model generalizability.
In this work, we present the creation of a new evaluation methodology using a train/test split that allows for the evaluation of machine learning models at varying degrees of structural similarity between training and test sets. We also introduce a training and evaluation framework that measures prediction accuracy on domain-inspired annotation and retrieval metrics designed to mirror real-world applications. We further show how two alternative training methods that leverage MS specific insights (e.g., similar instrumentation, collision energy, adduct) affect method performance and demonstrate the orthogonality of the proposed metrics. We especially highlight the role that collision energy plays in prediction errors. Finally, we release a continually updated version of our dataset online along with our data cleaning and splitting pipelines for community use.
It is our hope that this benchmark will serve as the basis of development for future machine learning approaches in MS/MS similarity and facilitate comparison between models. We anticipate that the introduced set of evaluation metrics allows for a better reflection of practical performance.
非靶向串联质谱法是一种用于小分子组织分析的可扩展解决方案。分析所获取的串联质谱数据(MS/MS)最常用的技术之一——称为分子网络——可对假定结构相关的化合物进行组织和可视化。然而,这种方法的一个关键瓶颈是用于识别附近结构邻域的MS/MS谱图的比较。机器学习(ML)方法已成为一种有前景的技术,可从MS/MS预测结构相似性,可能超越当前最先进的算法方法。然而,这些不同ML方法之间的比较仍然是一个挑战,因为缺乏用于基准测试、评估和比较MS/MS相似性方法的标准化,并且没有方法来解决训练数据和测试数据之间的数据泄漏问题,以便分析模型的泛化能力。
在这项工作中,我们提出了一种使用训练/测试分割创建新评估方法的方法,该方法允许在训练集和测试集之间不同程度的结构相似性下评估机器学习模型。我们还引入了一个训练和评估框架,该框架根据旨在反映实际应用的领域启发式注释和检索指标来衡量预测准确性。我们进一步展示了两种利用MS特定见解(例如,相似的仪器、碰撞能量、加合物)的替代训练方法如何影响方法性能,并证明了所提出指标的正交性。我们特别强调了碰撞能量在预测误差中所起的作用。最后,我们在线发布了数据集的持续更新版本以及我们的数据清理和分割管道,供社区使用。
我们希望这个基准将作为未来MS/MS相似性机器学习方法开发的基础,并促进模型之间的比较。我们预计引入的一组评估指标能够更好地反映实际性能。