Idakwo Gabriel, Thangapandian Sundar, Luttrell Joseph, Zhou Zhaoxian, Zhang Chaoyang, Gong Ping
School of Computing Sciences and Computer Engineering, The University of Southern Mississippi, Hattiesburg, MS, United States.
Environmental Laboratory, U.S. Army Engineer Research and Development Center, Vicksburg, MS, United States.
Front Physiol. 2019 Aug 13;10:1044. doi: 10.3389/fphys.2019.01044. eCollection 2019.
Deep learning (DL) has attracted the attention of computational toxicologists as it offers a potentially greater power for predictive toxicology than existing shallow learning algorithms. However, contradicting reports have been documented. To further explore the advantages of DL over shallow learning, we conducted this case study using two cell-based androgen receptor (AR) activity datasets with 10K chemicals generated from the Tox21 program. A nested double-loop cross-validation approach was adopted along with a stratified sampling strategy for partitioning chemicals of multiple AR activity classes (i.e., agonist, antagonist, inactive, and inconclusive) at the same distribution rates amongst the training, validation and test subsets. Deep neural networks (DNN) and random forest (RF), representing deep and shallow learning algorithms, respectively, were chosen to carry out structure-activity relationship-based chemical toxicity prediction. Results suggest that DNN significantly outperformed RF ( < 0.001, ANOVA) by 22-27% for four metrics (precision, recall, F-measure, and AUPRC) and by 11% for another (AUROC). Further in-depth analyses of chemical scaffolding shed insights on structural alerts for AR agonists/antagonists and inactive/inconclusive compounds, which may aid in future drug discovery and improvement of toxicity prediction modeling.
深度学习(DL)已引起计算毒理学家的关注,因为与现有的浅层学习算法相比,它在预测毒理学方面具有更大的潜在能力。然而,也有相互矛盾的报道。为了进一步探索深度学习相对于浅层学习的优势,我们使用从Tox21计划生成的两个包含10000种化学物质的基于细胞的雄激素受体(AR)活性数据集进行了本案例研究。采用了嵌套双循环交叉验证方法以及分层抽样策略,以便在训练、验证和测试子集中以相同的分布率对多个AR活性类别(即激动剂、拮抗剂、无活性和不确定)的化学物质进行划分。分别代表深度学习算法和浅层学习算法的深度神经网络(DNN)和随机森林(RF)被选来进行基于结构-活性关系的化学毒性预测。结果表明,对于四个指标(精确率、召回率、F值和AUPRC),DNN的表现显著优于RF(方差分析,<0.001),优势为22%-27%,对于另一个指标(AUROC),优势为11%。对化学支架的进一步深入分析揭示了AR激动剂/拮抗剂以及无活性/不确定化合物的结构警示,这可能有助于未来的药物发现和毒性预测模型的改进。