Morger Andrea, Mathea Miriam, Achenbach Janosch H, Wolf Antje, Buesen Roland, Schleifer Klaus-Juergen, Landsiedel Robert, Volkamer Andrea
In Silico Toxicology and Structural Bioinformatics, Institute of Physiology, Charité Universitätsmedizin Berlin, Charitéplatz 1, Berlin, Germany.
BASF SE, 67056, Ludwigshafen, Germany.
J Cheminform. 2020 Apr 14;12(1):24. doi: 10.1186/s13321-020-00422-x.
Risk assessment of newly synthesised chemicals is a prerequisite for regulatory approval. In this context, in silico methods have great potential to reduce time, cost, and ultimately animal testing as they make use of the ever-growing amount of available toxicity data. Here, KnowTox is presented, a novel pipeline that combines three different in silico toxicology approaches to allow for confident prediction of potentially toxic effects of query compounds, i.e. machine learning models for 88 endpoints, alerts for 919 toxic substructures, and computational support for read-across. It is mainly based on the ToxCast dataset, containing after preprocessing a sparse matrix of 7912 compounds tested against 985 endpoints. When applying machine learning models, applicability and reliability of predictions for new chemicals are of utmost importance. Therefore, first, the conformal prediction technique was deployed, comprising an additional calibration step and per definition creating internally valid predictors at a given significance level. Second, to further improve validity and information efficiency, two adaptations are suggested, exemplified at the androgen receptor antagonism endpoint. An absolute increase in validity of 23% on the in-house dataset of 534 compounds could be achieved by introducing KNNRegressor normalisation. This increase in validity comes at the cost of efficiency, which could again be improved by 20% for the initial ToxCast model by balancing the dataset during model training. Finally, the value of the developed pipeline for risk assessment is discussed using two in-house triazole molecules. Compared to a single toxicity prediction method, complementing the outputs of different approaches can have a higher impact on guiding toxicity testing and de-selecting most likely harmful development-candidate compounds early in the development process.
新合成化学品的风险评估是监管批准的前提条件。在此背景下,计算机模拟方法有很大潜力减少时间、成本,并最终减少动物试验,因为它们利用了不断增长的可用毒性数据。本文介绍了KnowTox,这是一种新颖的流程,它结合了三种不同的计算机模拟毒理学方法,以便可靠地预测查询化合物的潜在毒性效应,即针对88个终点的机器学习模型、针对919种有毒子结构的警报以及用于类推的计算支持。它主要基于ToxCast数据集,该数据集经过预处理后包含一个由7912种化合物针对985个终点进行测试的稀疏矩阵。在应用机器学习模型时,新化学品预测的适用性和可靠性至关重要。因此,首先部署了共形预测技术,包括一个额外的校准步骤,并根据定义在给定的显著性水平上创建内部有效的预测器。其次,为了进一步提高有效性和信息效率,提出了两种调整方法,并以雄激素受体拮抗作用终点为例进行说明。通过引入KNNRegressor归一化,在534种化合物的内部数据集上有效性可实现23%的绝对提升。有效性的这种提升是以效率为代价的,通过在模型训练期间平衡数据集,初始ToxCast模型的效率可再次提高20%。最后,使用两种内部的三唑分子讨论了所开发流程在风险评估中的价值。与单一毒性预测方法相比,补充不同方法的输出对指导毒性测试以及在开发过程早期淘汰最有可能有害的开发候选化合物可能有更大影响。