Department of Health and Human Services, NIH Chemical Genomics Center, National Institutes of Health, Bethesda, Maryland 20892-3370, USA.
Toxicol Sci. 2009 Dec;112(2):385-93. doi: 10.1093/toxsci/kfp231. Epub 2009 Oct 4.
In support of the U.S. Tox21 program, we have developed a simple and chemically intuitive model we call weighted feature significance (WFS) to predict the toxicological activity of compounds, based on the statistical enrichment of structural features in toxic compounds. We trained and tested the model on the following: (1) data from quantitative high-throughput screening cytotoxicity and caspase activation assays conducted at the National Institutes of Health Chemical Genomics Center, (2) data from Salmonella typhimurium reverse mutagenicity assays conducted by the U.S. National Toxicology Program, and (3) hepatotoxicity data published in the Registry of Toxic Effects of Chemical Substances. Enrichments of structural features in toxic compounds are evaluated for their statistical significance and compiled into a simple additive model of toxicity and then used to score new compounds for potential toxicity. The predictive power of the model for cytotoxicity was validated using an independent set of compounds from the U.S. Environmental Protection Agency tested also at the National Institutes of Health Chemical Genomics Center. We compared the performance of our WFS approach with classical classification methods such as Naive Bayesian clustering and support vector machines. In most test cases, WFS showed similar or slightly better predictive power, especially in the prediction of hepatotoxic compounds, where WFS appeared to have the best performance among the three methods. The new algorithm has the important advantages of simplicity, power, interpretability, and ease of implementation.
为支持美国 Tox21 计划,我们开发了一种简单且具有化学直观性的模型,我们称之为加权特征显著性(WFS),用于根据毒性化合物中结构特征的统计富集来预测化合物的毒理学活性。我们在以下方面对模型进行了训练和测试:(1)在国立卫生研究院化学基因组学中心进行的定量高通量筛选细胞毒性和半胱天冬酶激活测定的数据集,(2)由美国国家毒理学计划进行的鼠伤寒沙门氏菌回复突变性测定的数据集,以及(3)在《化学物质毒性效应登记册》中发表的肝毒性数据。毒性化合物中结构特征的富集度针对其统计学显著性进行评估,并被编译成一个简单的毒性加和模型,然后用于对新化合物进行潜在毒性评分。该模型对细胞毒性的预测能力通过使用来自美国环境保护署的独立化合物集进行验证,这些化合物也在国立卫生研究院化学基因组学中心进行了测试。我们将我们的 WFS 方法的性能与经典分类方法(如朴素贝叶斯聚类和支持向量机)进行了比较。在大多数测试案例中,WFS 显示出相似或稍好的预测能力,特别是在预测肝毒性化合物方面,WFS 似乎在这三种方法中表现最佳。新算法具有简单、强大、可解释性和易于实现的重要优势。