Zhang Yan, Liu Fei, Li Xiu Qin, Gao Yan, Li Kang Cong, Zhang Qing He
Key Laboratory of Groundwater Conservation of MWR, China University of Geosciences, Beijing, 100083, People's Republic of China.
Division of Chemical Metrology and Analytical Science, National Institute of Metrology, Beijing, 100029, People's Republic of China.
Sci Data. 2024 Aug 29;11(1):946. doi: 10.1038/s41597-024-03780-5.
Quantitative structure-property relationships have been extensively studied in the field of predicting retention times in liquid chromatography (LC). However, making transferable predictions is inherently complex because retention times are influenced by both the structure of the molecule and the chromatographic method used. Despite decades of development and numerous published machine learning models, the practical application of predicting small molecule retention time remains limited. The resulting models are typically limited to specific chromatographic conditions and the molecules used in their training and evaluation. Here, we have developed a comprehensive dataset comprising over 10,000 experimental retention times. These times were derived from 30 different reversed-phase liquid chromatography methods and pertain to a collection of 343 small molecules representing a wide range of chemical structures. These chromatographic methods encompass common LC setups for studying the retention behavior of small molecules. They offer a wide range of examples for modeling retention time with different LC setups.
在液相色谱(LC)中预测保留时间的领域,定量结构-性质关系已得到广泛研究。然而,进行可转移的预测本质上很复杂,因为保留时间受分子结构和所用色谱方法两者的影响。尽管经过数十年的发展以及众多已发表的机器学习模型,但预测小分子保留时间的实际应用仍然有限。所得模型通常局限于特定的色谱条件以及用于其训练和评估的分子。在此,我们开发了一个包含超过10000个实验保留时间的综合数据集。这些时间来自30种不同的反相液相色谱方法,并且涉及代表广泛化学结构的343个小分子的集合。这些色谱方法涵盖了用于研究小分子保留行为的常见LC设置。它们为使用不同LC设置对保留时间进行建模提供了广泛的示例。