Suppr超能文献

机器学习驱动的预测毒理学中的权衡预测性和可解释性:使用 Tox21 数据集的深入研究。

Trade-off Predictivity and Explainability for Machine-Learning Powered Predictive Toxicology: An in-Depth Investigation with Tox21 Data Sets.

机构信息

Division of Bioinformatics and Biostatistics, National Center for Toxicological Research, FDA, 3900 NCTR Road, Jefferson, Arkansas 72079, United States.

Division of Preclinical Innovation, National Center for Advancing Translational Sciences, National Institutes of Health, 9800 Medical Center Drive, Rockville, Maryland 20850, United States.

出版信息

Chem Res Toxicol. 2021 Feb 15;34(2):541-549. doi: 10.1021/acs.chemrestox.0c00373. Epub 2021 Jan 29.

Abstract

Selecting a model in predictive toxicology often involves a trade-off between prediction performance and explainability: should we sacrifice the model performance to gain explainability or vice versa. Here we present a comprehensive study to assess algorithm and feature influences on model performance in chemical toxicity research. We conducted over 5000 models for a Tox21 bioassay data set of 65 assays and ∼7600 compounds. Seven molecular representations as features and 12 modeling approaches varying in complexity and explainability were employed to systematically investigate the impact of various factors on model performance and explainability. We demonstrated that end points dictated a model's performance, regardless of the chosen modeling approach including deep learning and chemical features. Overall, more complex models such as (LS-)SVM and Random Forest performed marginally better than simpler models such as linear regression and KNN in the presented Tox21 data analysis. Since a simpler model with acceptable performance often also is easy to interpret for the Tox21 data set, it clearly was the preferred choice due to its better explainability. Given that each data set had its own error structure both for dependent and independent variables, we strongly recommend that it is important to conduct a systematic study with a broad range of model complexity and feature explainability to identify model balancing its predictivity and explainability.

摘要

在预测毒理学中选择模型通常需要在预测性能和可解释性之间进行权衡

我们是否应该牺牲模型性能来获得可解释性,或者反之亦然。在这里,我们进行了一项全面的研究,以评估算法和特征对化学毒性研究中模型性能的影响。我们针对 Tox21 生物测定数据集(包含 65 个测定和约 7600 种化合物)进行了超过 5000 个模型的构建。我们使用了七种分子表示作为特征,并采用了 12 种在复杂性和可解释性方面有所不同的建模方法,以系统地研究各种因素对模型性能和可解释性的影响。我们证明,终点决定了模型的性能,而与所选择的建模方法(包括深度学习和化学特征)无关。总体而言,在呈现的 Tox21 数据分析中,诸如(最小二乘)支持向量机和随机森林等更复杂的模型比线性回归和 KNN 等更简单的模型的性能略有提高。由于对于 Tox21 数据集而言,具有可接受性能的简单模型通常也更容易解释,因此它显然是首选,因为它具有更好的可解释性。鉴于每个数据集都有其自己的误差结构,无论是对于因变量还是自变量,我们强烈建议进行具有广泛模型复杂性和特征可解释性的系统研究,以确定平衡预测能力和可解释性的模型。

相似文献

2
PERform: assessing model performance with predictivity and explainability readiness formula.PERform:使用可预测性和可解释性准备公式评估模型性能。
J Environ Sci Health C Toxicol Carcinog. 2024;42(4):298-313. doi: 10.1080/26896583.2024.2340391. Epub 2024 Apr 15.

引用本文的文献

5
PERform: assessing model performance with predictivity and explainability readiness formula.PERform:使用可预测性和可解释性准备公式评估模型性能。
J Environ Sci Health C Toxicol Carcinog. 2024;42(4):298-313. doi: 10.1080/26896583.2024.2340391. Epub 2024 Apr 15.

本文引用的文献

3
The Tox21 10K Compound Library: Collaborative Chemistry Advancing Toxicology.Tox21 十库化合物库:协作化学推动毒理学发展。
Chem Res Toxicol. 2021 Feb 15;34(2):189-216. doi: 10.1021/acs.chemrestox.0c00264. Epub 2020 Nov 3.
7
Toxicogenomics: A 2020 Vision.毒理基因组学:2020 年展望。
Trends Pharmacol Sci. 2019 Feb;40(2):92-103. doi: 10.1016/j.tips.2018.12.001. Epub 2018 Dec 26.
8
Comparative Study of Multitask Toxicity Modeling on a Broad Chemical Space.多任务毒性建模在广阔化学空间上的比较研究。
J Chem Inf Model. 2019 Mar 25;59(3):1062-1072. doi: 10.1021/acs.jcim.8b00685. Epub 2019 Jan 23.
10
A Survey of Multi-task Learning Methods in Chemoinformatics.化学信息学中多任务学习方法研究综述
Mol Inform. 2019 Apr;38(4):e1800108. doi: 10.1002/minf.201800108. Epub 2018 Nov 28.

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验