Department of Chemistry and Biology, Pfizer Global Research and Development, Sandwich Laboratories, Sandwich, Kent, CT13 9NJ, UK.
J Cheminform. 2010 Dec 9;2(1):11. doi: 10.1186/1758-2946-2-11.
We collected data from over 80 different cytotoxicity assays from Pfizer in-house work as well as from public sources and investigated the feasibility of using these datasets, which come from a variety of assay formats (having for instance different measured endpoints, incubation times and cell types) to derive a general cytotoxicity model. Our main aim was to derive a computational model based on this data that can highlight potentially cytotoxic series early in the drug discovery process.
We developed Bayesian models for each assay using Scitegic FCFP_6 fingerprints together with the default physical property descriptors. Pairs of assays that are mutually predictive were identified by calculating the ROC score of the model derived from one predicting the experimental outcome of the other, and vice versa. The prediction pairs were visualised in a network where nodes are assays and edges are drawn for ROC scores >0.60 in both directions. We observed that, if assay pairs (A, B) and (B, C) were mutually predictive, this was often not the case for the pair (A, C). The results from 48 assays connected to each other were merged in one training set of 145590 compounds and a general cytotoxicity model was derived. The model has been cross-validated as well as being validated with a set of 89 FDA approved drug compounds.
We have generated a predictive model for general cytotoxicity which could speed up the drug discovery process in multiple ways. Firstly, this analysis has shown that the outcomes of different assay formats can be mutually predictive, thus removing the need to submit a potentially toxic compound to multiple assays. Furthermore, this analysis enables selection of (a) the easiest-to-run assay as corporate standard, or (b) the most descriptive panel of assays by including assays whose outcomes are not mutually predictive. The model is no replacement for a cytotoxicity assay but opens the opportunity to be more selective about which compounds are to be submitted to it. On a more mundane level, having data from more than 80 assays in one dataset answers, for the first time, the question - "what are the known cytotoxic compounds from the Pfizer compound collection?" Finally, having a predictive cytotoxicity model will assist the design of new compounds with a desired cytotoxicity profile, since comparison of the model output with data from an in vitro safety/toxicology assay suggests one is predictive of the other.
我们从辉瑞公司内部工作以及公共资源中收集了 80 多种不同细胞毒性测定的数据,并研究了使用这些数据集的可行性,这些数据集来自各种测定格式(例如,具有不同的测量终点、孵育时间和细胞类型)来得出通用细胞毒性模型。我们的主要目的是基于这些数据开发一种计算模型,该模型可以在药物发现过程的早期突出潜在的细胞毒性系列。
我们使用 Scitegic FCFP_6 指纹和默认物理属性描述符为每个测定开发了贝叶斯模型。通过计算从一个测定预测另一个测定的实验结果的模型的 ROC 得分,并反过来进行预测,可以识别出相互预测的测定对。预测对在一个网络中可视化,其中节点是测定,并且如果在两个方向上的 ROC 得分>0.60,则绘制边缘。我们观察到,如果测定对(A,B)和(B,C)是相互预测的,则通常不是(A,C)对的情况。连接在一起的 48 个测定的结果合并到一个包含 145590 个化合物的训练集中,并得出一个通用细胞毒性模型。该模型已进行了交叉验证,并与一组 89 种 FDA 批准的药物化合物进行了验证。
我们已经生成了一个通用细胞毒性的预测模型,它可以通过多种方式加速药物发现过程。首先,这项分析表明,不同测定格式的结果可以相互预测,因此无需将潜在有毒化合物提交给多个测定。此外,通过包括结果不是相互预测的测定,可以选择(a)最容易运行的测定作为公司标准,或(b)最具描述性的测定面板。该模型不能替代细胞毒性测定,但为更有选择性地提交哪些化合物提供了机会。在更平凡的层面上,在一个数据集中拥有 80 多个测定的数据首次回答了“辉瑞化合物库中有哪些已知的细胞毒性化合物?”的问题。最后,拥有一个预测性的细胞毒性模型将有助于设计具有所需细胞毒性特征的新化合物,因为模型输出与体外安全性/毒理学测定的数据比较表明,一个可以预测另一个。