In silico Toxicology and Structural Bioinformatics, Institute of Physiology, Charité-Universitätsmedizin Berlin, Charitéplatz 1, 10117, Berlin, Germany.
Leibniz-Forschungsinstitut für Molekulare Pharmakologie (FMP), Robert-Roessle Strasse 10, 13125, Berlin, Germany.
J Comput Aided Mol Des. 2020 Jul;34(7):731-746. doi: 10.1007/s10822-020-00310-4. Epub 2020 Apr 16.
In drug development, late stage toxicity issues of a compound are the main cause of failure in clinical trials. In silico methods are therefore of high importance to guide the early design process to reduce time, costs and animal testing. Technical advances and the ever growing amount of available toxicity data enabled machine learning, especially neural networks, to impact the field of predictive toxicology. In this study, cytotoxicity prediction, one of the earliest handles in drug discovery, is investigated using a deep learning approach trained on a highly consistent in-house data set of over 34,000 compounds with a share of less than 5% of cytotoxic molecules. The model reached a balanced accuracy of over 70%, similar to previously reported studies using Random Forest. Albeit yielding good results, neural networks are often described as a black box lacking deeper mechanistic understanding of the underlying model. To overcome this absence of interpretability, a Deep Taylor Decomposition method is investigated to identify substructures that may be responsible for the cytotoxic effects, the so-called toxicophores. Furthermore, this study introduces cytotoxicity maps which provide a visual structural interpretation of the relevance of these substructures. Using this approach could be helpful in drug development to predict the potential toxicity of a compound as well as to generate new insights into the toxic mechanism. Moreover, it could also help to de-risk and optimize compounds.
在药物开发中,化合物的后期毒性问题是临床试验失败的主要原因。因此,计算方法对于指导早期设计过程以减少时间、成本和动物测试非常重要。技术的进步和可用毒性数据的不断增加,使得机器学习,特别是神经网络,能够对预测毒理学领域产生影响。在这项研究中,使用一种经过高度一致的内部数据集训练的深度学习方法来研究细胞毒性预测,该数据集包含超过 34000 种化合物,其中不到 5%的化合物具有细胞毒性。该模型的平衡准确率超过 70%,与之前使用随机森林报告的研究相似。尽管神经网络产生了很好的结果,但它们通常被描述为一个缺乏对基础模型更深入机制理解的黑盒子。为了克服这种缺乏可解释性的情况,研究人员调查了一种深度泰勒分解方法,以确定可能导致细胞毒性的亚结构,即所谓的毒性基团。此外,本研究引入了细胞毒性图谱,为这些亚结构的相关性提供了直观的结构解释。这种方法可以帮助药物开发人员预测化合物的潜在毒性,并深入了解毒性机制。此外,它还有助于降低风险和优化化合物。