Department of Pharmaceutical Chemistry, Department of Bioengineering and Therapeutic Sciences, Bakar Computational Health Sciences Institute, Kavli Institute for Fundamental Neuroscience, Institute for Neurodegenerative Diseases, University of California, San Francisco, 675 Nelson Rising Ln NS 416A, San Francisco, California 94143, United States.
J Chem Inf Model. 2020 Dec 28;60(12):5957-5970. doi: 10.1021/acs.jcim.0c00565. Epub 2020 Nov 27.
Multitask deep neural networks learn to predict ligand-target binding by example, yet public pharmacological data sets are sparse, imbalanced, and approximate. We constructed two hold-out benchmarks to approximate temporal and drug-screening test scenarios, whose characteristics differ from a random split of conventional training data sets. We developed a pharmacological data set augmentation procedure, Stochastic Negative Addition (SNA), which randomly assigns untested molecule-target pairs as transient negative examples during training. Under the procedure, drug-screening benchmark performance increases from = 0.1926 ± 0.0186 to 0.4269 ± 0.0272 (122%). This gain was accompanied by a modest decrease in the temporal benchmark (13%). SNA increases in drug-screening performance were consistent for classification and regression tasks and outperformed -randomized controls. Our results highlight where data and feature uncertainty may be problematic and how leveraging uncertainty into training improves predictions of drug-target relationships.
多任务深度神经网络通过示例学习预测配体-靶标结合,但公共药理学数据集稀疏、不平衡且近似。我们构建了两个保留基准来近似时间和药物筛选测试场景,其特征与传统训练数据集的随机划分不同。我们开发了一种药理学数据集扩充程序,Stochastic Negative Addition (SNA),它在训练期间随机将未经测试的分子-靶标对分配为临时负例。在该程序下,药物筛选基准性能从 = 0.1926 ± 0.0186 提高到 0.4269 ± 0.0272(122%)。这一增益伴随着时间基准的适度下降(13%)。SNA 在药物筛选性能方面的提高对分类和回归任务都是一致的,并优于随机化对照。我们的结果强调了数据和特征不确定性可能存在问题的地方,以及如何利用不确定性进行训练来提高药物-靶标关系预测的准确性。