Suppr超能文献

评估用于蛋白质-配体结合预测的神经网络中的点预测不确定性。

Evaluating point-prediction uncertainties in neural networks for protein-ligand binding prediction.

作者信息

Fan Ya Ju, Allen Jonathan E, McLoughlin Kevin S, Shi Da, Bennion Brian J, Zhang Xiaohua, Lightstone Felice C

机构信息

Center for Applied Scientific Computing, Lawrence Livermore National Laboratory, 7000 East Ave., Livermore, CA, USA.

Biological Science and Security Center, Lawrence Livermore National Laboratory, Livermore, CA, USA.

出版信息

Artif Intell Chem. 2023 Jun;1(1). doi: 10.1016/j.aichem.2023.100004. Epub 2023 Jun 3.

Abstract

Neural Network (NN) models provide potential to speed up the drug discovery process and reduce its failure rates. The success of NN models requires uncertainty quantification (UQ) as drug discovery explores chemical space beyond the training data distribution. Standard NN models do not provide uncertainty information. Some methods require changing the NN architecture or training procedure, limiting the selection of NN models. Moreover, predictive uncertainty can come from different sources. It is important to have the ability to separately model different types of predictive uncertainty, as the model can take assorted actions depending on the source of uncertainty. In this paper, we examine UQ methods that estimate different sources of predictive uncertainty for NN models aiming at protein-ligand binding prediction. We use our prior knowledge on chemical compounds to design the experiments. By utilizing a visualization method we create non-overlapping and chemically diverse partitions from a collection of chemical compounds. These partitions are used as training and test set splits to explore NN model uncertainty. We demonstrate how the uncertainties estimated by the selected methods describe different sources of uncertainty under different partitions and featurization schemes and the relationship to prediction error.

摘要

神经网络(NN)模型为加速药物发现过程和降低失败率提供了潜力。由于药物发现探索的化学空间超出了训练数据分布范围,NN模型的成功需要不确定性量化(UQ)。标准的NN模型不提供不确定性信息。一些方法需要改变NN架构或训练过程,限制了NN模型的选择。此外,预测不确定性可能来自不同来源。能够分别对不同类型的预测不确定性进行建模很重要,因为模型可以根据不确定性的来源采取不同的行动。在本文中,我们研究了针对蛋白质-配体结合预测的NN模型估计不同预测不确定性来源的UQ方法。我们利用对化合物的先验知识来设计实验。通过使用一种可视化方法,我们从一组化合物中创建了不重叠且化学性质不同的分区。这些分区用作训练集和测试集划分,以探索NN模型的不确定性。我们展示了所选方法估计的不确定性如何描述不同分区和特征化方案下的不同不确定性来源以及与预测误差的关系。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f204/10426331/af03a7a4f410/nihms-1912151-f0001.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验