Zhu Hao, Rusyn Ivan, Richard Ann, Tropsha Alexander
Carolina Environmental Bioinformatics Research Center, University of North Carolina, Chapel Hill, NC 27599-7360, USA.
Environ Health Perspect. 2008 Apr;116(4):506-13. doi: 10.1289/ehp.10573.
To develop efficient approaches for rapid evaluation of chemical toxicity and human health risk of environmental compounds, the National Toxicology Program (NTP) in collaboration with the National Center for Chemical Genomics has initiated a project on high-throughput screening (HTS) of environmental chemicals. The first HTS results for a set of 1,408 compounds tested for their effects on cell viability in six different cell lines have recently become available via PubChem.
We have explored these data in terms of their utility for predicting adverse health effects of the environmental agents.
Initially, the classification k nearest neighbor (kNN) quantitative structure-activity relationship (QSAR) modeling method was applied to the HTS data only, for a curated data set of 384 compounds. The resulting models had prediction accuracies for training, test (containing 275 compounds together), and external validation (109 compounds) sets as high as 89%, 71%, and 74%, respectively. We then asked if HTS results could be of value in predicting rodent carcinogenicity. We identified 383 compounds for which data were available from both the Berkeley Carcinogenic Potency Database and NTP-HTS studies. We found that compounds classified by HTS as "actives" in at least one cell line were likely to be rodent carcinogens (sensitivity 77%); however, HTS "inactives" were far less informative (specificity 46%). Using chemical descriptors only, kNN QSAR modeling resulted in 62.3% prediction accuracy for rodent carcinogenicity applied to this data set. Importantly, the prediction accuracy of the model was significantly improved (72.7%) when chemical descriptors were augmented by HTS data, which were regarded as biological descriptors.
Our studies suggest that combining NTP-HTS profiles with conventional chemical descriptors could considerably improve the predictive power of computational approaches in toxicology.
为开发高效方法以快速评估环境化合物的化学毒性和人类健康风险,美国国家毒理学计划(NTP)与国家化学基因组学中心合作启动了一个环境化学品高通量筛选(HTS)项目。最近,通过PubChem可获取一组1408种化合物在六种不同细胞系中对细胞活力影响的首次高通量筛选结果。
我们已就这些数据在预测环境因子不良健康影响方面的效用进行了探索。
最初,分类k近邻(kNN)定量构效关系(QSAR)建模方法仅应用于384种化合物的精选数据集的高通量筛选数据。所得模型对训练集、测试集(共包含275种化合物)和外部验证集(109种化合物)的预测准确率分别高达89%、71%和74%。然后,我们询问高通量筛选结果在预测啮齿动物致癌性方面是否有价值。我们确定了383种化合物,其数据可从伯克利致癌潜能数据库和NTP - HTS研究中获取。我们发现,在至少一种细胞系中被高通量筛选分类为“活性”的化合物很可能是啮齿动物致癌物(敏感性77%);然而,高通量筛选的“非活性”化合物提供的信息要少得多(特异性46%)。仅使用化学描述符,kNN QSAR建模应用于该数据集时对啮齿动物致癌性的预测准确率为62.3%。重要的是,当化学描述符通过被视为生物描述符的高通量筛选数据进行扩充时,模型的预测准确率显著提高(72.7%)。
我们的研究表明,将NTP - HTS谱与传统化学描述符相结合可大幅提高毒理学计算方法的预测能力。