Division of Bioinformatics and Biostatistics, National Center for Toxicological Research, US Food and Drug Administration, Jefferson, Arkansas, 72079, USA.
Sci Rep. 2017 Dec 11;7(1):17311. doi: 10.1038/s41598-017-17701-7.
Drug-induced liver injury (DILI) presents a significant challenge to drug development and regulatory science. The FDA's Liver Toxicity Knowledge Base (LTKB) evaluated >1000 drugs for their likelihood of causing DILI in humans, of which >700 drugs were classified into three categories (most-DILI, less-DILI, and no-DILI). Based on this dataset, we developed and compared 2-class and 3-class DILI prediction models using the machine learning algorithm of Decision Forest (DF) with Mold2 structural descriptors. The models were evaluated through 1000 iterations of 5-fold cross-validations, 1000 bootstrapping validations and 1000 permutation tests (that assessed the chance correlation). Furthermore, prediction confidence analysis was conducted, which provides an additional parameter for proper interpretation of prediction results. We revealed that the 3-class model not only had a higher resolution to estimate DILI risk but also showed an improved capability to differentiate most-DILI drugs from no-DILI drugs in comparison with the 2-class DILI model. We demonstrated the utility of the models for drug ingredients with warnings very recently issued by the FDA. Moreover, we identified informative molecular features important for assessing DILI risk. Our results suggested that the 3-class model presents a better option than the binary model (which most publications are focused on) for drug safety evaluation.
药物性肝损伤(DILI)对药物开发和监管科学提出了重大挑战。FDA 的肝脏毒性知识库(LTKB)评估了 >1000 种药物在人类中引起 DILI 的可能性,其中 >700 种药物被分为三类(最易引起 DILI、较不易引起 DILI 和不易引起 DILI)。基于这个数据集,我们使用决策森林(DF)机器学习算法和 Mold2 结构描述符开发并比较了 2 类和 3 类 DILI 预测模型。通过 1000 次 5 折交叉验证、1000 次自举验证和 1000 次置换测试(评估机会相关性)评估了模型。此外,还进行了预测置信度分析,为正确解释预测结果提供了额外的参数。我们发现,与 2 类 DILI 模型相比,3 类模型不仅具有更高的分辨率来估计 DILI 风险,而且还具有更好的能力来区分最易引起 DILI 的药物和不易引起 DILI 的药物。我们展示了这些模型对于 FDA 最近发布警告的药物成分的实用性。此外,我们确定了对评估 DILI 风险很重要的信息分子特征。我们的结果表明,3 类模型比二元模型(大多数出版物都关注的模型)更适合药物安全性评估。