Department of Chemistry and Biochemistry , Ohio University , Athens , Ohio 45701 , United States.
J Chem Inf Model. 2019 Mar 25;59(3):1005-1016. doi: 10.1021/acs.jcim.8b00671. Epub 2019 Jan 11.
Deep learning has drawn significant attention in different areas including drug discovery. It has been proposed that it could outperform other machine learning algorithms, especially with big data sets. In the field of pharmaceutical industry, machine learning models are built to understand quantitative structure-activity relationships (QSARs) and predict molecular activities, including absorption, distribution, metabolism, and excretion (ADME) properties, using only molecular structures. Previous reports have demonstrated the advantages of using deep neural networks (DNNs) for QSAR modeling. One of the challenges while building DNN models is identifying the hyperparameters that lead to better generalization of the models. In this study, we investigated several tunable hyperparameters of deep neural network models on 24 industrial ADME data sets. We analyzed the sensitivity and influence of five different hyperparameters including the learning rate, weight decay for L2 regularization, dropout rate, activation function, and the use of batch normalization. This paper focuses on strategies and practices for DNN model building. Further, the optimized model for each data set was built and compared with the benchmark models used in production. Based on our benchmarking results, we propose several practices for building DNN QSAR models.
深度学习在包括药物发现在内的不同领域引起了广泛关注。有人提出,它可以胜过其他机器学习算法,尤其是在大数据集方面。在制药行业,机器学习模型被用来理解定量构效关系(QSAR),并仅使用分子结构来预测分子活性,包括吸收、分布、代谢和排泄(ADME)性质。以前的报告已经证明了使用深度神经网络(DNN)进行 QSAR 建模的优势。在构建 DNN 模型时面临的挑战之一是确定导致模型更好泛化的超参数。在这项研究中,我们研究了 24 个工业 ADME 数据集上的几种可调整超参数的深度神经网络模型。我们分析了五个不同超参数的敏感性和影响,包括学习率、L2 正则化的权重衰减、辍学率、激活函数以及批量归一化的使用。本文重点介绍 DNN 模型构建的策略和实践。此外,为每个数据集构建了优化模型,并与生产中使用的基准模型进行了比较。基于我们的基准测试结果,我们提出了构建 DNN QSAR 模型的一些实践。