基于图神经网络的分子监督式学习综合研究。

Comprehensive Study on Molecular Supervised Learning with Graph Neural Networks.

机构信息

AITRICS, Hyoryoung-ro 77-gil, Seocho-gu, Seoul, Republic of Korea.

Department of Chemistry, KAIST, 291 Daehak-ro, Yuseong-gu, Daejeon 34141, Republic of Korea.

出版信息

J Chem Inf Model. 2020 Dec 28;60(12):5936-5945. doi: 10.1021/acs.jcim.0c00416. Epub 2020 Nov 8.

DOI:10.1021/acs.jcim.0c00416

PMID:33164522

Abstract

This work considers strategies to develop accurate and reliable graph neural networks (GNNs) for molecular property predictions. Prediction performance of GNNs is highly sensitive to the change in various parameters due to the inherent challenges in molecular machine learning, such as a deficient amount of data samples and bias in data distribution. Comparative studies with well-designed experiments are thus important to clearly understand which GNNs are powerful for molecular supervised learning. Our work presents a number of ablation studies along with a guideline to train and utilize GNNs for both molecular regression and classification tasks. First, we validate that using both atomic and bond meta-information improves the prediction performance in the regression task. Second, we find that the graph isomorphism hypothesis proposed by [Xu, K. How powerful are graph neural networks? 2018, arXiv:1810.00826. arXiv.org e-Print archive. https://arxiv.org/abs/1810.00826] is valid for the regression task. Surprisingly, however, the findings above do not hold for the classification tasks. Beyond the study on model architectures, we test various regularization methods and Bayesian learning algorithms to find the best strategy to achieve a reliable classification system. We demonstrate that regularization methods penalizing predictive entropy might not give well-calibrated probability estimation, even though they work well in other domains, and Bayesian learning methods are capable of developing reliable prediction systems. Furthermore, we argue the importance of Bayesian learning in virtual screening by showing that well-calibrated probability estimation may lead to a higher success rate.

摘要

这项工作考虑了开发用于分子性质预测的准确可靠图神经网络（GNN）的策略。由于分子机器学习中的固有挑战，例如数据样本数量不足和数据分布偏差，GNN 的预测性能对各种参数的变化非常敏感。因此，与精心设计的实验进行比较研究对于清楚地了解哪些 GNN 对分子监督学习具有强大的作用非常重要。我们的工作提出了一些消融研究，并提供了一个指南，用于训练和利用 GNN 进行分子回归和分类任务。首先，我们验证了同时使用原子和键元信息可以提高回归任务的预测性能。其次，我们发现 [Xu, K. How powerful are graph neural networks? 2018, arXiv:1810.00826. arXiv.org e-Print archive. https://arxiv.org/abs/1810.00826] 提出的图同构假设在回归任务中是有效的。然而，令人惊讶的是，这些发现不适用于分类任务。除了对模型架构的研究之外，我们还测试了各种正则化方法和贝叶斯学习算法，以找到实现可靠分类系统的最佳策略。我们证明了惩罚预测熵的正则化方法可能无法给出良好校准的概率估计，即使它们在其他领域表现良好，并且贝叶斯学习方法能够开发可靠的预测系统。此外，我们通过展示良好校准的概率估计可能导致更高的成功率，来论证贝叶斯学习在虚拟筛选中的重要性。

相似文献

Comprehensive Study on Molecular Supervised Learning with Graph Neural Networks.基于图神经网络的分子监督式学习综合研究。

J Chem Inf Model. 2020 Dec 28;60(12):5936-5945. doi: 10.1021/acs.jcim.0c00416. Epub 2020 Nov 8.

MGLNN: Semi-supervised learning via Multiple Graph Cooperative Learning Neural Networks.MGLNN：基于多图协同学习神经网络的半监督学习。

Neural Netw. 2022 Sep;153:204-214. doi: 10.1016/j.neunet.2022.05.024. Epub 2022 Jun 3.

DGCL: dual-graph neural networks contrastive learning for molecular property prediction.DGCL：用于分子性质预测的双图神经网络对比学习。

Brief Bioinform. 2024 Sep 23;25(6). doi: 10.1093/bib/bbae474.

A Comprehensive Survey on Graph Neural Networks.图神经网络综述。

IEEE Trans Neural Netw Learn Syst. 2021 Jan;32(1):4-24. doi: 10.1109/TNNLS.2020.2978386. Epub 2021 Jan 4.

A unified deep semi-supervised graph learning scheme based on nodes re-weighting and manifold regularization.一种基于节点重新加权和流形正则化的统一深度半监督图学习方案。

Neural Netw. 2023 Jan;158:188-196. doi: 10.1016/j.neunet.2022.11.017. Epub 2022 Nov 19.

Graph Convolution Networks with manifold regularization for semi-supervised learning.图卷积网络与流形正则化的半监督学习。

Neural Netw. 2020 Jul;127:160-167. doi: 10.1016/j.neunet.2020.04.016. Epub 2020 Apr 23.

k-hop graph neural networks.k 跳图神经网络。

Neural Netw. 2020 Oct;130:195-205. doi: 10.1016/j.neunet.2020.07.008. Epub 2020 Jul 10.

Augmented Graph Neural Network with hierarchical global-based residual connections.基于层次全局残差连接的增强图神经网络。

Neural Netw. 2022 Jun;150:149-166. doi: 10.1016/j.neunet.2022.03.008. Epub 2022 Mar 10.

Graph Transformer Networks: Learning meta-path graphs to improve GNNs.图 Transformer 网络：学习元路径图以改进 GNNs。

Neural Netw. 2022 Sep;153:104-119. doi: 10.1016/j.neunet.2022.05.026. Epub 2022 Jun 4.

XGraphBoost: Extracting Graph Neural Network-Based Features for a Better Prediction of Molecular Properties.XGraphBoost：提取基于图神经网络的特征以更好地预测分子性质。

J Chem Inf Model. 2021 Jun 28;61(6):2697-2705. doi: 10.1021/acs.jcim.0c01489. Epub 2021 May 19.

引用本文的文献

AI-Driven Biomarker Discovery and Personalized Allergy Treatment: Utilizing Machine Learning and NGS.人工智能驱动的生物标志物发现与个性化过敏治疗：利用机器学习和二代测序技术

Curr Allergy Asthma Rep. 2025 Jun 3;25(1):27. doi: 10.1007/s11882-025-01207-8.

Reducing overconfident errors in molecular property classification using Posterior Network.使用后验网络减少分子性质分类中的过度自信错误。

Patterns (N Y). 2024 May 8;5(6):100991. doi: 10.1016/j.patter.2024.100991. eCollection 2024 Jun 14.

MolLM: a unified language model for integrating biomedical text with 2D and 3D molecular representations.MolLM：一种将生物医学文本与 2D 和 3D 分子表示集成的统一语言模型。

Bioinformatics. 2024 Jun 28;40(Suppl 1):i357-i368. doi: 10.1093/bioinformatics/btae260.

Improving chemical reaction yield prediction using pre-trained graph neural networks.使用预训练的图神经网络改进化学反应产率预测

J Cheminform. 2024 Mar 1;16(1):25. doi: 10.1186/s13321-024-00818-z.

Machine Learning Methods for Small Data Challenges in Molecular Science.机器学习方法在分子科学中小数据挑战中的应用。

Chem Rev. 2023 Jul 12;123(13):8736-8780. doi: 10.1021/acs.chemrev.3c00189. Epub 2023 Jun 29.

Characterizing Uncertainty in Machine Learning for Chemistry.机器学习在化学中的不确定性描述。

J Chem Inf Model. 2023 Jul 10;63(13):4012-4029. doi: 10.1021/acs.jcim.3c00373. Epub 2023 Jun 20.

Large-scale evaluation of k-fold cross-validation ensembles for uncertainty estimation.用于不确定性估计的k折交叉验证集成的大规模评估。

J Cheminform. 2023 Apr 28;15(1):49. doi: 10.1186/s13321-023-00709-9.

Drug-likeness scoring based on unsupervised learning.基于无监督学习的类药性质评分

Chem Sci. 2021 Dec 14;13(2):554-565. doi: 10.1039/d1sc05248a. eCollection 2022 Jan 5.

Selecting molecules with diverse structures and properties by maximizing submodular functions of descriptors learned with graph neural networks.通过最大化基于图神经网络学习的描述符的子模函数来选择具有多样结构和性质的分子。

Sci Rep. 2022 Jan 21;12(1):1124. doi: 10.1038/s41598-022-04967-9.

DeepReac+: deep active learning for quantitative modeling of organic chemical reactions.DeepReac+：用于有机化学反应定量建模的深度主动学习

Chem Sci. 2021 Oct 9;12(43):14459-14472. doi: 10.1039/d1sc02087k. eCollection 2021 Nov 10.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

基于图神经网络的分子监督式学习综合研究。

Comprehensive Study on Molecular Supervised Learning with Graph Neural Networks.

机构信息

出版信息

相似文献

引用本文的文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献

引用本文的文献