Suppr超能文献

基于图神经网络的分子监督式学习综合研究。

Comprehensive Study on Molecular Supervised Learning with Graph Neural Networks.

机构信息

AITRICS, Hyoryoung-ro 77-gil, Seocho-gu, Seoul, Republic of Korea.

Department of Chemistry, KAIST, 291 Daehak-ro, Yuseong-gu, Daejeon 34141, Republic of Korea.

出版信息

J Chem Inf Model. 2020 Dec 28;60(12):5936-5945. doi: 10.1021/acs.jcim.0c00416. Epub 2020 Nov 8.

Abstract

This work considers strategies to develop accurate and reliable graph neural networks (GNNs) for molecular property predictions. Prediction performance of GNNs is highly sensitive to the change in various parameters due to the inherent challenges in molecular machine learning, such as a deficient amount of data samples and bias in data distribution. Comparative studies with well-designed experiments are thus important to clearly understand which GNNs are powerful for molecular supervised learning. Our work presents a number of ablation studies along with a guideline to train and utilize GNNs for both molecular regression and classification tasks. First, we validate that using both atomic and bond meta-information improves the prediction performance in the regression task. Second, we find that the graph isomorphism hypothesis proposed by [Xu, K. How powerful are graph neural networks? 2018, arXiv:1810.00826. arXiv.org e-Print archive. https://arxiv.org/abs/1810.00826] is valid for the regression task. Surprisingly, however, the findings above do not hold for the classification tasks. Beyond the study on model architectures, we test various regularization methods and Bayesian learning algorithms to find the best strategy to achieve a reliable classification system. We demonstrate that regularization methods penalizing predictive entropy might not give well-calibrated probability estimation, even though they work well in other domains, and Bayesian learning methods are capable of developing reliable prediction systems. Furthermore, we argue the importance of Bayesian learning in virtual screening by showing that well-calibrated probability estimation may lead to a higher success rate.

摘要

这项工作考虑了开发用于分子性质预测的准确可靠图神经网络(GNN)的策略。由于分子机器学习中的固有挑战,例如数据样本数量不足和数据分布偏差,GNN 的预测性能对各种参数的变化非常敏感。因此,与精心设计的实验进行比较研究对于清楚地了解哪些 GNN 对分子监督学习具有强大的作用非常重要。我们的工作提出了一些消融研究,并提供了一个指南,用于训练和利用 GNN 进行分子回归和分类任务。首先,我们验证了同时使用原子和键元信息可以提高回归任务的预测性能。其次,我们发现 [Xu, K. How powerful are graph neural networks? 2018, arXiv:1810.00826. arXiv.org e-Print archive. https://arxiv.org/abs/1810.00826] 提出的图同构假设在回归任务中是有效的。然而,令人惊讶的是,这些发现不适用于分类任务。除了对模型架构的研究之外,我们还测试了各种正则化方法和贝叶斯学习算法,以找到实现可靠分类系统的最佳策略。我们证明了惩罚预测熵的正则化方法可能无法给出良好校准的概率估计,即使它们在其他领域表现良好,并且贝叶斯学习方法能够开发可靠的预测系统。此外,我们通过展示良好校准的概率估计可能导致更高的成功率,来论证贝叶斯学习在虚拟筛选中的重要性。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验