Liu Yuting, Yoshizawa Akiyasu C, Ling Yiwei, Okuda Shujiro
Medical AI Center, Niigata University School of Medicine, Niigata City, Niigata, 951-8514, Japan.
J Cheminform. 2024 Oct 7;16(1):113. doi: 10.1186/s13321-024-00905-1.
In untargeted metabolomics, structures of small molecules are annotated using liquid chromatography-mass spectrometry by leveraging information from the molecular retention time (RT) in the chromatogram and m/z (formerly called ''mass-to-charge ratio'') in the mass spectrum. However, correct identification of metabolites is challenging due to the vast array of small molecules. Therefore, various in silico tools for mass spectrometry peak alignment and compound prediction have been developed; however, the list of candidate compounds remains extensive. Accurate RT prediction is important to exclude false candidates and facilitate metabolite annotation. Recent advancements in artificial intelligence (AI) have led to significant breakthroughs in the use of deep learning models in various fields. Release of a large RT dataset has mitigated the bottlenecks limiting the application of deep learning models, thereby improving their application in RT prediction tasks. This review lists the databases that can be used to expand training datasets and concerns the issue about molecular representation inconsistencies in datasets. It also discusses the application of AI technology for RT prediction, particularly in the 5 years following the release of the METLIN small molecule RT dataset. This review provides a comprehensive overview of the AI applications used for RT prediction, highlighting the progress and remaining challenges. SCIENTIFIC CONTRIBUTION: This article focuses on the advancements in small molecule retention time prediction in computational metabolomics over the past five years, with a particular emphasis on the application of AI technologies in this field. It reviews the publicly available datasets for small molecule retention time, the molecular representation methods, the AI algorithms applied in recent studies. Furthermore, it discusses the effectiveness of these models in assisting with the annotation of small molecule structures and the challenges that must be addressed to achieve practical applications.
在非靶向代谢组学中,小分子结构通过液相色谱-质谱联用技术进行注释,利用色谱图中的分子保留时间(RT)和质谱中的质荷比(m/z,以前称为“质量-电荷比”)信息。然而,由于小分子种类繁多,正确鉴定代谢物具有挑战性。因此,已经开发了各种用于质谱峰对齐和化合物预测的计算机工具;然而,候选化合物列表仍然很长。准确的保留时间预测对于排除错误候选物和促进代谢物注释很重要。人工智能(AI)的最新进展在各个领域的深度学习模型应用方面取得了重大突破。一个大型保留时间数据集的发布缓解了限制深度学习模型应用的瓶颈,从而改善了它们在保留时间预测任务中的应用。本综述列出了可用于扩展训练数据集的数据库,并关注数据集中分子表示不一致的问题。它还讨论了人工智能技术在保留时间预测中的应用,特别是在METLIN小分子保留时间数据集发布后的5年。本综述全面概述了用于保留时间预测的人工智能应用,突出了进展和 remaining challenges。科学贡献:本文重点介绍了计算代谢组学在小分子保留时间预测方面过去五年的进展,特别强调了人工智能技术在该领域的应用。它回顾了公开可用的小分子保留时间数据集、分子表示方法、近期研究中应用的人工智能算法。此外,它讨论了这些模型在协助小分子结构注释方面的有效性以及实现实际应用必须解决的挑战。