Scripps Center for Metabolomics, The Scripps Research Institute, La Jolla, CA, USA.
Centre for Omic Sciences, EURECAT - Technology Centre of Catalonia & Rovira i Virgili University joint unit, Reus, Catalonia, Spain.
Nat Commun. 2019 Dec 20;10(1):5811. doi: 10.1038/s41467-019-13680-7.
Machine learning has been extensively applied in small molecule analysis to predict a wide range of molecular properties and processes including mass spectrometry fragmentation or chromatographic retention time. However, current approaches for retention time prediction lack sufficient accuracy due to limited available experimental data. Here we introduce the METLIN small molecule retention time (SMRT) dataset, an experimentally acquired reverse-phase chromatography retention time dataset covering up to 80,038 small molecules. To demonstrate the utility of this dataset, we deployed a deep learning model for retention time prediction applied to small molecule annotation. Results showed that in 70[Formula: see text] of the cases, the correct molecular identity was ranked among the top 3 candidates based on their predicted retention time. We anticipate that this dataset will enable the community to apply machine learning or first principles strategies to generate better models for retention time prediction.
机器学习已被广泛应用于小分子分析,以预测广泛的分子性质和过程,包括质谱碎裂或色谱保留时间。然而,由于可用的实验数据有限,当前的保留时间预测方法准确性不足。在这里,我们介绍 METLIN 小分子保留时间 (SMRT) 数据集,这是一个实验获得的反相色谱保留时间数据集,涵盖了多达 80038 个小分子。为了展示这个数据集的实用性,我们部署了一个用于小分子注释的保留时间预测的深度学习模型。结果表明,在 70%的情况下,根据预测的保留时间,正确的分子身份排名在前 3 名候选者之列。我们预计,这个数据集将使社区能够应用机器学习或第一性原理策略,以生成更好的保留时间预测模型。