Suppr超能文献

来自ToxCast和Tox21雌激素受体检测的大量环境化学物质数据集的共形预测分类

Conformal Prediction Classification of a Large Data Set of Environmental Chemicals from ToxCast and Tox21 Estrogen Receptor Assays.

作者信息

Norinder Ulf, Boyer Scott

机构信息

Swedish Toxicology Sciences Research Center , SE-151 36 Södertälje, Sweden.

出版信息

Chem Res Toxicol. 2016 Jun 20;29(6):1003-10. doi: 10.1021/acs.chemrestox.6b00037. Epub 2016 May 13.

Abstract

Quantitative structure-activity relationships (QSAR) are critical to exploitation of the chemical information in toxicology databases. Exploitation can be extraction of chemical knowledge from the data but also making predictions of new chemicals based on quantitative analysis of past findings. In this study, we analyzed the ToxCast and Tox21 estrogen receptor data sets using Conformal Prediction to enhance the full exploitation of the information in these data sets. We applied aggregated conformal prediction (ACP) to the ToxCast and Tox21 estrogen receptor data sets using support vector machine classifiers to compare overall performance of the models but, more importantly, to explore the performance of ACP on data sets that are significantly enriched in one class without employing sampling strategies of the training set. ACP was also used to investigate the problem of applicability domain using both data sets. Comparison of ACP to previous results obtained on the same data sets using traditional QSAR approaches indicated similar overall balanced performance to methods in which careful training set selections were made, e.g., sensitivity and specificity for the external Tox21 data set of 70-75% and far superior results to those obtained using traditional methods without training set sampling where the corresponding results showed a clear imbalance of 50 and 96%, respectively. Application of conformal prediction to imbalanced data sets facilitates an unambiguous analysis of all data, allows accurate predictive models to be built which display similar accuracy in external validation to external validation, and, most importantly, allows an unambiguous treatment of the applicability domain.

摘要

定量构效关系(QSAR)对于毒理学数据库中化学信息的利用至关重要。这种利用既可以是从数据中提取化学知识,也可以是基于对过去研究结果的定量分析来预测新的化学物质。在本研究中,我们使用共形预测分析了ToxCast和Tox21雌激素受体数据集,以加强对这些数据集中信息的充分利用。我们将聚合共形预测(ACP)应用于ToxCast和Tox21雌激素受体数据集,使用支持向量机分类器来比较模型的整体性能,但更重要的是,在不采用训练集采样策略的情况下,探索ACP在一类数据显著富集的数据集中的性能。ACP还用于利用这两个数据集研究适用域问题。将ACP与之前使用传统QSAR方法在相同数据集上获得的结果进行比较,结果表明其整体平衡性能与那些精心选择训练集的方法相似,例如,外部Tox21数据集的灵敏度和特异性为70 - 75%,并且比那些未使用训练集采样的传统方法获得的结果要好得多,后者相应的结果分别显示出50%和96%的明显不平衡。将共形预测应用于不平衡数据集有助于对所有数据进行明确分析,能够构建出在外部验证中显示出与外部验证相似准确性的准确预测模型,并且最重要的是,能够对适用域进行明确处理。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验