Department of Pharmaceutical Sciences, Faculty of Life Sciences, University of Vienna, 1090 Vienna, Austria.
Department of Chemistry and Computational Biology Init (CBU), University of Bergen, N-5020 Bergen, Norway.
Toxicol Lett. 2023 May 15;381:20-26. doi: 10.1016/j.toxlet.2023.04.005. Epub 2023 Apr 13.
In silico methods are essential to the safety evaluation of chemicals. Computational risk assessment offers several approaches, with data science and knowledge-based methods becoming an increasingly important sub-group. One of the substantial attributes of data science is that it allows using existing data to find correlations, build strong hypotheses, and create new, valuable knowledge that may help to reduce the number of resource intensive experiments. In choosing a suitable method for toxicity prediction, the available data and desired toxicity endpoint are two essential factors to consider. The complexity of the endpoint can impact the success rate of the in silico models. For highly complex endpoints such as hepatotoxicity, it can be beneficial to decipher the toxic event from a more systemic point of view. We propose a data science-based modelling pipeline that uses compounds` connections to tissue-specific biological targets, interactome, and biological pathways as descriptors of compounds. Models trained on different combinations of the collected, compound-target, compound-interactor, and compound-pathway profiles, were used to predict the hepatotoxicity of drug-like compounds. Several tree-based models were trained, utilizing separate and combined target, interactome and pathway level variables. The model using combined descriptors of all levels and the random forest algorithm was further optimized. Descriptor importance for model performance was addressed and examined for a biological explanation to define which targets or pathways can have a crucial role in toxicity. Descriptors connected to cytochromes P450 enzymes, heme degradation and biological oxidation received high weights. Furthermore, the involvement of other, less discussed processes in connection with toxicity, such as the involvement of RHO GTPase effectors in hepatotoxicity, were marked as fundamental. The optimized combined model using only the selected descriptors yielded the best performance with an accuracy of 0.766. The same dataset using classical Morgan fingerprints for compound representation yielded models with similar performance measures, as well as the combination of systems biology-based descriptors and Morgan fingerprints. Consequently, adding the structural information of compounds did not enhance the predictive value of the models. The developed systems biology-based pipeline comprises a valuable tool in predicting toxicity, while providing novel insights about the possible mechanisms of the unwanted events.
在计算机中方法是化学物质安全评估的基础。计算风险评估提供了几种方法,其中数据科学和基于知识的方法成为一个越来越重要的分组。数据科学的一个重要属性是它允许使用现有的数据来找到相关性,建立强有力的假设,并创造新的、有价值的知识,这可能有助于减少资源密集型实验的数量。在选择毒性预测的合适方法时,可用数据和所需毒性终点是两个需要考虑的关键因素。终点的复杂性会影响计算机模型的成功率。对于像肝毒性这样复杂的终点,可以从更系统的角度来解析毒性事件。我们提出了一种基于数据科学的建模管道,该管道使用化合物与组织特异性生物靶标、相互作用组和生物途径的连接作为化合物的描述符。利用收集的化合物-靶标、化合物-相互作用体和化合物-途径谱的不同组合训练模型,以预测类药物化合物的肝毒性。训练了几种基于树的模型,利用单独和组合的靶标、相互作用体和途径水平变量。利用所有水平的组合描述符和随机森林算法进一步优化了模型。对模型性能的描述符重要性进行了探讨,并对其进行了生物学解释,以确定哪些靶标或途径在毒性中可能起关键作用。与细胞色素 P450 酶、血红素降解和生物氧化相关的描述符具有较高的权重。此外,还标记了与毒性有关的其他讨论较少的过程的参与,例如 RHO GTP 酶效应物在肝毒性中的参与,这些过程被认为是基础性的。使用仅选择的描述符的优化组合模型的准确率为 0.766。使用经典的摩根指纹表示化合物的相同数据集产生了具有相似性能度量的模型,以及基于系统生物学的描述符和摩根指纹的组合。因此,添加化合物的结构信息并没有提高模型的预测值。开发的基于系统生物学的管道是一种预测毒性的有价值的工具,同时提供了关于不良事件可能机制的新见解。