Hupatz Henrik, Rahu Ida, Wang Wei-Chieh, Peets Pilleriin, Palm Emma H, Kruve Anneli
Department of Materials and Environmental Chemistry, Stockholm University, Svante Arrhenius Väg 16, 114 18, Stockholm, Sweden.
Stockholm University Center for Circular and Sustainable Systems (SUCCeSS), Stockholm University, 106 91, Stockholm, Sweden.
Anal Bioanal Chem. 2025 Jan;417(3):473-493. doi: 10.1007/s00216-024-05471-x. Epub 2024 Aug 14.
Non-targeted screening with liquid chromatography coupled to high-resolution mass spectrometry (LC/HRMS) is increasingly leveraging in silico methods, including machine learning, to obtain candidate structures for structural annotation of LC/HRMS features and their further prioritization. Candidate structures are commonly retrieved based on the tandem mass spectral information either from spectral or structural databases; however, the vast majority of the detected LC/HRMS features remain unannotated, constituting what we refer to as a part of the unknown chemical space. Recently, the exploration of this chemical space has become accessible through generative models. Furthermore, the evaluation of the candidate structures benefits from the complementary empirical analytical information such as retention time, collision cross section values, and ionization type. In this critical review, we provide an overview of the current approaches for retrieving and prioritizing candidate structures. These approaches come with their own set of advantages and limitations, as we showcase in the example of structural annotation of ten known and ten unknown LC/HRMS features. We emphasize that these limitations stem from both experimental and computational considerations. Finally, we highlight three key considerations for the future development of in silico methods.
液相色谱与高分辨率质谱联用(LC/HRMS)的非靶向筛查越来越多地利用包括机器学习在内的计算机方法,以获取用于LC/HRMS特征结构注释及其进一步优先级排序的候选结构。候选结构通常基于串联质谱信息从光谱数据库或结构数据库中检索;然而,绝大多数检测到的LC/HRMS特征仍未注释,构成了我们所说的未知化学空间的一部分。最近,通过生成模型可以对这个化学空间进行探索。此外,候选结构的评估受益于保留时间、碰撞截面值和电离类型等互补的经验分析信息。在这篇批判性综述中,我们概述了当前检索和优先排序候选结构的方法。正如我们在十个已知和十个未知LC/HRMS特征的结构注释示例中所展示的那样,这些方法都有其自身的优点和局限性。我们强调,这些局限性源于实验和计算方面的考虑。最后,我们强调了计算机方法未来发展的三个关键考虑因素。