J Chem Inf Model. 2019 Mar 25;59(3):1253-1268. doi: 10.1021/acs.jcim.8b00785. Epub 2019 Jan 24.
Successful drug discovery projects require control and optimization of compound properties related to pharmacokinetics, pharmacodynamics, and safety. While volume and chemotype coverage of public and corporate ADME-Tox (absorption, distribution, excretion, metabolism, and toxicity) databases are constantly growing, deep neural nets (DNN) emerged as transformative artificial intelligence technology to analyze those challenging data. Relevant features are automatically identified, while appropriate data can also be combined to multitask networks to evaluate hidden trends among multiple ADME-Tox parameters for implicitly correlated data sets. Here we describe a novel, fully industrialized approach to parametrize and optimize the setup, training, application, and visual interpretation of DNNs to model ADME-Tox data. Investigated properties include microsomal lability in different species, passive permeability in Caco-2/TC7 cells, and logD. Statistical models are developed using up to 50 000 compounds from public or corporate databases. Both the choice of DNN hyperparameters and the type and quantity of molecular descriptors were found to be important for successful DNN modeling. Alternate learning of multiple ADME-Tox properties, resulting in a multitask approach, performs statistically superior on most studied data sets in comparison to DNN single-task models and also provides a scalable method to predict ADME-Tox properties from heterogeneous data. For example, predictive quality using external validation sets was improved from R of 0.6 to 0.7 comparing single-task and multitask DNN networks from human metabolic lability data. Besides statistical evaluation, a new visualization approach is introduced to interpret DNN models termed "response map", which is useful to detect local property gradients based on structure fragmentation and derivatization. This method is successfully applied to visualize fragmental contributions to guide further design in drug discovery programs, as illustrated by CRCX3 antagonists and renin inhibitors, respectively.
成功的药物发现项目需要控制和优化与药代动力学、药效学和安全性相关的化合物性质。虽然公共和公司的 ADME-Tox(吸收、分布、排泄、代谢和毒性)数据库的体积和化学型覆盖率在不断增加,但深度神经网络(DNN)作为一种变革性的人工智能技术出现,用于分析这些具有挑战性的数据。相关特征被自动识别,同时还可以适当组合数据以用于多任务网络,以评估多个 ADME-Tox 参数之间的隐藏趋势,对于隐式相关数据集。在这里,我们描述了一种新颖的、完全工业化的方法,用于参数化和优化 DNN 的设置、训练、应用和可视化解释,以对 ADME-Tox 数据进行建模。研究的性质包括不同物种中微粒体的不稳定性、Caco-2/TC7 细胞中的被动渗透性和 logD。使用来自公共或公司数据库的多达 50,000 种化合物开发统计模型。发现 DNN 超参数的选择以及分子描述符的类型和数量对于成功的 DNN 建模都很重要。多任务学习多个 ADME-Tox 属性,从而形成多任务方法,在与 DNN 单任务模型相比,在大多数研究的数据集中表现出统计学上的优越性,并且还提供了一种可扩展的方法,用于从异构数据中预测 ADME-Tox 属性。例如,与单任务和多任务 DNN 网络相比,使用外部验证集进行预测的质量从人类代谢不稳定性数据中的 R 提高到 0.7。除了统计评估外,还引入了一种新的可视化方法来解释 DNN 模型,称为“响应图”,该方法可用于根据结构碎片化和衍生化检测局部性质梯度。该方法成功地应用于可视化片段对药物发现项目的进一步设计的贡献,分别以 CRCX3 拮抗剂和肾素抑制剂为例。