Suppr超能文献

揭开用于定量构效关系的多任务深度神经网络的神秘面纱。

Demystifying Multitask Deep Neural Networks for Quantitative Structure-Activity Relationships.

作者信息

Xu Yuting, Ma Junshui, Liaw Andy, Sheridan Robert P, Svetnik Vladimir

机构信息

Biometrics Research Department, Merck & Co., Inc. , Rahway, New Jersey 07065, United States.

Modeling and Informatics Department, Merck & Co., Inc. , Kenilworth, New Jersey 07033, United States.

出版信息

J Chem Inf Model. 2017 Oct 23;57(10):2490-2504. doi: 10.1021/acs.jcim.7b00087. Epub 2017 Oct 2.

Abstract

Deep neural networks (DNNs) are complex computational models that have found great success in many artificial intelligence applications, such as computer vision1,2 and natural language processing.3,4 In the past four years, DNNs have also generated promising results for quantitative structure-activity relationship (QSAR) tasks.5,6 Previous work showed that DNNs can routinely make better predictions than traditional methods, such as random forests, on a diverse collection of QSAR data sets. It was also found that multitask DNN models-those trained on and predicting multiple QSAR properties simultaneously-outperform DNNs trained separately on the individual data sets in many, but not all, tasks. To date there has been no satisfactory explanation of why the QSAR of one task embedded in a multitask DNN can borrow information from other unrelated QSAR tasks. Thus, using multitask DNNs in a way that consistently provides a predictive advantage becomes a challenge. In this work, we explored why multitask DNNs make a difference in predictive performance. Our results show that during prediction a multitask DNN does borrow "signal" from molecules with similar structures in the training sets of the other tasks. However, whether this borrowing leads to better or worse predictive performance depends on whether the activities are correlated. On the basis of this, we have developed a strategy to use multitask DNNs that incorporate prior domain knowledge to select training sets with correlated activities, and we demonstrate its effectiveness on several examples.

摘要

深度神经网络(DNN)是复杂的计算模型,在许多人工智能应用中都取得了巨大成功,如计算机视觉[1,2]和自然语言处理[3,4]。在过去四年中,DNN在定量构效关系(QSAR)任务中也取得了令人鼓舞的成果[5,6]。先前的工作表明,在各种QSAR数据集上,DNN通常能比传统方法(如随机森林)做出更好的预测。研究还发现,多任务DNN模型(即在多个QSAR属性上进行训练并同时预测的模型)在许多(但并非所有)任务中比在单个数据集上单独训练的DNN表现更好。迄今为止,对于多任务DNN中嵌入的一个任务的QSAR为何能从其他不相关的QSAR任务中借用信息,尚无令人满意的解释。因此,以一种始终能提供预测优势的方式使用多任务DNN成为一项挑战。在这项工作中,我们探究了多任务DNN在预测性能上产生差异的原因。我们的结果表明,在预测过程中,多任务DNN确实会从其他任务训练集中具有相似结构的分子中借用“信号”。然而,这种借用导致预测性能变好还是变差取决于活性是否相关。基于此,我们开发了一种使用多任务DNN的策略,该策略结合先验领域知识来选择具有相关活性的训练集,并在几个例子中证明了其有效性。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验