Ramsundar Bharath, Liu Bowen, Wu Zhenqin, Verras Andreas, Tudor Matthew, Sheridan Robert P, Pande Vijay
Department of Computer Science, Stanford University , Stanford, California 94305, United States.
Department of Chemistry, Stanford University , Stanford, California 94305, United States.
J Chem Inf Model. 2017 Aug 28;57(8):2068-2076. doi: 10.1021/acs.jcim.7b00146. Epub 2017 Aug 1.
Multitask deep learning has emerged as a powerful tool for computational drug discovery. However, despite a number of preliminary studies, multitask deep networks have yet to be widely deployed in the pharmaceutical and biotech industries. This lack of acceptance stems from both software difficulties and lack of understanding of the robustness of multitask deep networks. Our work aims to resolve both of these barriers to adoption. We introduce a high-quality open-source implementation of multitask deep networks as part of the DeepChem open-source platform. Our implementation enables simple python scripts to construct, fit, and evaluate sophisticated deep models. We use our implementation to analyze the performance of multitask deep networks and related deep models on four collections of pharmaceutical data (three of which have not previously been analyzed in the literature). We split these data sets into train/valid/test using time and neighbor splits to test multitask deep learning performance under challenging conditions. Our results demonstrate that multitask deep networks are surprisingly robust and can offer strong improvement over random forests. Our analysis and open-source implementation in DeepChem provide an argument that multitask deep networks are ready for widespread use in commercial drug discovery.
多任务深度学习已成为计算药物发现的强大工具。然而,尽管有一些初步研究,但多任务深度网络尚未在制药和生物技术行业中广泛应用。这种缺乏接受度既源于软件困难,也源于对多任务深度网络稳健性的理解不足。我们的工作旨在解决这两个采用障碍。作为DeepChem开源平台的一部分,我们引入了多任务深度网络的高质量开源实现。我们的实现使简单的Python脚本能够构建、拟合和评估复杂的深度模型。我们使用我们的实现来分析多任务深度网络和相关深度模型在四个药物数据集上的性能(其中三个数据集以前在文献中未被分析过)。我们使用时间和邻居分割将这些数据集拆分为训练/验证/测试集,以在具有挑战性的条件下测试多任务深度学习性能。我们的结果表明,多任务深度网络出奇地稳健,并且可以比随机森林有显著改进。我们在DeepChem中的分析和开源实现表明,多任务深度网络已准备好在商业药物发现中广泛使用。