School of Engineering, Westlake University, Hangzhou, Zhejiang, 310024, China.
School of Computer Science and Engineering, Southeast University, Nanjing, Jiangsu, 210096, China.
J Chromatogr A. 2023 Nov 22;1711:464439. doi: 10.1016/j.chroma.2023.464439. Epub 2023 Oct 13.
The retention time (RT) is a crucial source of data for liquid chromatography-mass spectrometry (LCMS). A model that can accurately predict the RT for each molecule would empower filtering candidates with similar spectra but differing RT in LCMS-based molecule identification. Recent research shows that graph neural networks (GNNs) outperform traditional machine learning algorithms in RT prediction. However, all of these models use relatively shallow GNNs. This study for the first time investigates how depth affects GNNs' performance on RT prediction. The results demonstrate that a notable improvement can be achieved by pushing the depth of GNNs to 16 layers by the adoption of residual connection. Additionally, we also find that graph convolutional network (GCN) model benefits from the edge information. The developed deep graph convolutional network, DeepGCN-RT, significantly outperforms the previous state-of-the-art method and achieves the lowest mean absolute percentage error (MAPE) of 3.3% and the lowest mean absolute error (MAE) of 26.55 s on the SMRT test set. We also finetune DeepGCN-RT on seven datasets with various chromatographic conditions. The mean MAE of the seven datasets largely decreases 30% compared to previous state-of-the-art method. On the RIKEN-PlaSMA dataset, we also test the effectiveness of DeepGCN-RT in assisting molecular structure identification. By 30% lessening the number of potential structures, DeepGCN-RT is able to improve top-1 accuracy by about 11%.
保留时间 (RT) 是液相色谱-质谱 (LCMS) 的重要数据来源。如果有一种模型能够准确预测每个分子的 RT,那么在基于 LCMS 的分子识别中,就可以对具有相似光谱但 RT 不同的候选物进行过滤。最近的研究表明,图神经网络 (GNN) 在 RT 预测方面优于传统的机器学习算法。然而,所有这些模型都使用相对较浅的 GNN。本研究首次探讨了深度如何影响 GNN 在 RT 预测中的性能。结果表明,通过采用残差连接将 GNN 的深度推至 16 层,可以显著提高性能。此外,我们还发现图卷积网络 (GCN) 模型受益于边缘信息。所开发的深度图卷积网络 DeepGCN-RT 显著优于先前的最先进方法,在 SMRT 测试集上实现了最低的平均绝对百分比误差 (MAPE) 3.3%和最低的平均绝对误差 (MAE) 26.55 秒。我们还在具有各种色谱条件的七个数据集上微调了 DeepGCN-RT。与先前的最先进方法相比,这七个数据集的平均 MAE 大大降低了 30%。在 RIKEN-PlaSMA 数据集上,我们还测试了 DeepGCN-RT 在辅助分子结构识别方面的有效性。通过将潜在结构的数量减少 30%,DeepGCN-RT 能够将准确率提高约 11%。