School of Computer Science and Technology, Dalian University of Technology, Dalian 116024, Liaoning, China.
CAS Key Laboratory of Separation Science for Analytical Chemistry, Dalian Institute of Chemical Physics, Chinese Academy of Sciences, Dalian 116023, Liaoning, China.
J Chromatogr B Analyt Technol Biomed Life Sci. 2023 Feb 15;1217:123624. doi: 10.1016/j.jchromb.2023.123624. Epub 2023 Feb 4.
Retention time (RT) can provide orthogonal information different from that of mass spectrometry and contribute to identifying compounds. Many machine learning methods have been developed and applied to RT prediction. In application, the training data size is usually small in most chromatography systems. To enhance the performance of RT prediction, this study proposes a RT prediction method based on multi-data combinations and adaptive neural network (MDC-ANN). MDC-ANN establishes the RT prediction model for the target chromatographic system through transfer learning and a base deep learning model trained on a big dataset. It selects the optimal molecular representation combination from the multiple input candidates and automatically determines the neural network structure according to the determined input combination. MDC-ANN was compared with two new efficient deep learning methods, three transferring methods and four popular machine learning methods on 14 small datasets and showed advantages in MAE, MedAE, MRE and R in most cases. The experiment results illustrated that integrating multiple molecular representations can provide more information, improve the performance of RT prediction and contribute to compound annotation, different chromatographic systems may use different molecular representation combinations to obtain good RT prediction performance. Hence, MDC-ANN which automatically determines the best combination of molecular representations for a specific system is promising for predicting RTs accurately in real applications.
保留时间(RT)可以提供与质谱不同的正交信息,并有助于鉴定化合物。已经开发并应用了许多机器学习方法来进行 RT 预测。在应用中,大多数色谱系统中的训练数据通常较小。为了提高 RT 预测的性能,本研究提出了一种基于多数据组合和自适应神经网络(MDC-ANN)的 RT 预测方法。MDC-ANN 通过迁移学习和基于大数据集训练的基础深度学习模型,为目标色谱系统建立 RT 预测模型。它从多个输入候选者中选择最佳分子表示组合,并根据确定的输入组合自动确定神经网络结构。MDC-ANN 在 14 个小数据集上与两种新的高效深度学习方法、三种迁移方法和四种流行的机器学习方法进行了比较,在大多数情况下,MAE、MedAE、MRE 和 R 方面具有优势。实验结果表明,整合多种分子表示可以提供更多信息,提高 RT 预测的性能,并有助于化合物注释。不同的色谱系统可能使用不同的分子表示组合来获得良好的 RT 预测性能。因此,MDC-ANN 自动确定特定系统的最佳分子表示组合,有望在实际应用中准确预测 RT。