使用共形预测评估毒理学体外模型中的校准。

Assessing the calibration in toxicological in vitro models with conformal prediction.

作者信息

Morger Andrea, Svensson Fredrik, Arvidsson McShane Staffan, Gauraha Niharika, Norinder Ulf, Spjuth Ola, Volkamer Andrea

机构信息

In Silico Toxicology and Structural Bioinformatics, Institute of Physiology, Charité Universitätsmedizin, Berlin, Germany.

Alzheimer's Research UK UCL Drug Discovery Institute, London, WC1E 6BT, UK.

出版信息

J Cheminform. 2021 Apr 29;13(1):35. doi: 10.1186/s13321-021-00511-5.

DOI:10.1186/s13321-021-00511-5

PMID:33926567

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC8082859/

Abstract

Machine learning methods are widely used in drug discovery and toxicity prediction. While showing overall good performance in cross-validation studies, their predictive power (often) drops in cases where the query samples have drifted from the training data's descriptor space. Thus, the assumption for applying machine learning algorithms, that training and test data stem from the same distribution, might not always be fulfilled. In this work, conformal prediction is used to assess the calibration of the models. Deviations from the expected error may indicate that training and test data originate from different distributions. Exemplified on the Tox21 datasets, composed of chronologically released Tox21Train, Tox21Test and Tox21Score subsets, we observed that while internally valid models could be trained using cross-validation on Tox21Train, predictions on the external Tox21Score data resulted in higher error rates than expected. To improve the prediction on the external sets, a strategy exchanging the calibration set with more recent data, such as Tox21Test, has successfully been introduced. We conclude that conformal prediction can be used to diagnose data drifts and other issues related to model calibration. The proposed improvement strategy-exchanging the calibration data only-is convenient as it does not require retraining of the underlying model.

摘要

机器学习方法在药物发现和毒性预测中被广泛应用。虽然在交叉验证研究中总体表现良好，但在查询样本偏离训练数据的描述符空间的情况下，它们的预测能力（通常）会下降。因此，应用机器学习算法的假设，即训练数据和测试数据来自相同的分布，可能并不总是成立。在这项工作中，共形预测用于评估模型的校准。与预期误差的偏差可能表明训练数据和测试数据来自不同的分布。以Tox21数据集为例，该数据集由按时间顺序发布的Tox21Train、Tox21Test和Tox21Score子集组成，我们观察到，虽然可以使用Tox21Train上的交叉验证训练内部有效的模型，但对外部Tox21Score数据的预测导致错误率高于预期。为了改进对外部集的预测，已经成功引入了一种将校准集与更新的数据（如Tox21Test）交换的策略。我们得出结论，共形预测可用于诊断数据漂移和与模型校准相关的其他问题。所提出的改进策略——仅交换校准数据——很方便，因为它不需要重新训练基础模型。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/61e2/8082859/de35daddfd77/13321_2021_511_Fig1_HTML.jpg

相似文献

Assessing the calibration in toxicological in vitro models with conformal prediction.使用共形预测评估毒理学体外模型中的校准。

J Cheminform. 2021 Apr 29;13(1):35. doi: 10.1186/s13321-021-00511-5.

Studying and mitigating the effects of data drifts on ML model performance at the example of chemical toxicity data.以化学毒性数据为例，研究和减轻数据漂移对机器学习模型性能的影响。

Sci Rep. 2022 May 4;12(1):7244. doi: 10.1038/s41598-022-09309-3.

Conformal Prediction Classification of a Large Data Set of Environmental Chemicals from ToxCast and Tox21 Estrogen Receptor Assays.来自ToxCast和Tox21雌激素受体检测的大量环境化学物质数据集的共形预测分类

Chem Res Toxicol. 2016 Jun 20;29(6):1003-10. doi: 10.1021/acs.chemrestox.6b00037. Epub 2016 May 13.

Synergy conformal prediction applied to large-scale bioactivity datasets and in federated learning.应用于大规模生物活性数据集和联邦学习的协同共形预测

J Cheminform. 2021 Oct 2;13(1):77. doi: 10.1186/s13321-021-00555-7.

Dynamic applicability domain (dAD): compound-target binding affinity estimates with local conformal prediction.动态适用域 (dAD)：基于局部共形预测的化合物-靶标结合亲和力估计。

Bioinformatics. 2023 Aug 1;39(8). doi: 10.1093/bioinformatics/btad465.

Predicting With Confidence: Using Conformal Prediction in Drug Discovery.有信心的预测：在药物发现中使用一致性预测。

J Pharm Sci. 2021 Jan;110(1):42-49. doi: 10.1016/j.xphs.2020.09.055. Epub 2020 Oct 17.

Deep Learning-Based Conformal Prediction of Toxicity.基于深度学习的毒性保形预测。

J Chem Inf Model. 2021 Jun 28;61(6):2648-2657. doi: 10.1021/acs.jcim.1c00208. Epub 2021 May 27.

Applying Mondrian Cross-Conformal Prediction To Estimate Prediction Confidence on Large Imbalanced Bioactivity Data Sets.将 Mondrian 交叉保形预测应用于大型不平衡生物活性数据集的预测置信度估计。

J Chem Inf Model. 2017 Jul 24;57(7):1591-1598. doi: 10.1021/acs.jcim.7b00159. Epub 2017 Jun 30.

CPSign: conformal prediction for cheminformatics modeling.CPSign：用于化学信息学建模的共形预测

J Cheminform. 2024 Jun 28;16(1):75. doi: 10.1186/s13321-024-00870-9.

Machine learning algorithms for the prediction of conception success to a given insemination in lactating dairy cows.用于预测泌乳奶牛特定授精受孕成功率的机器学习算法。

J Dairy Sci. 2015 Aug;98(8):5262-73. doi: 10.3168/jds.2014-8984.

引用本文的文献

CPSign: conformal prediction for cheminformatics modeling.CPSign：用于化学信息学建模的共形预测

J Cheminform. 2024 Jun 28;16(1):75. doi: 10.1186/s13321-024-00870-9.

Reliable anti-cancer drug sensitivity prediction and prioritization.可靠的抗癌药物敏感性预测和优先级排序。

Sci Rep. 2024 May 29;14(1):12303. doi: 10.1038/s41598-024-62956-6.

Federated Learning in Computational Toxicology: An Industrial Perspective on the Effiris Hackathon.计算毒理学中的联邦学习：Effiris 黑客马拉松的工业视角。

Chem Res Toxicol. 2023 Sep 18;36(9):1503-1517. doi: 10.1021/acs.chemrestox.3c00137. Epub 2023 Aug 16.

Sci Rep. 2022 May 4;12(1):7244. doi: 10.1038/s41598-022-09309-3.

Synergy conformal prediction applied to large-scale bioactivity datasets and in federated learning.应用于大规模生物活性数据集和联邦学习的协同共形预测

J Cheminform. 2021 Oct 2;13(1):77. doi: 10.1186/s13321-021-00555-7.

Machine Learning Strategies When Transitioning between Biological Assays.机器学习策略在生物学检测中的转换。

J Chem Inf Model. 2021 Jul 26;61(7):3722-3733. doi: 10.1021/acs.jcim.1c00293. Epub 2021 Jun 21.

本文引用的文献

QSAR-derived affinity fingerprints (part 2): modeling performance for potency prediction.基于定量构效关系的亲和力指纹图谱（第2部分）：效能预测的建模性能

J Cheminform. 2020 Jun 5;12(1):41. doi: 10.1186/s13321-020-00444-5.

KnowTox: pipeline and case study for confident prediction of potential toxic effects of compounds in early phases of development.KnowTox：用于在开发早期阶段可靠预测化合物潜在毒性作用的流程及案例研究。

J Cheminform. 2020 Apr 14;12(1):24. doi: 10.1186/s13321-020-00422-x.

The Tox21 10K Compound Library: Collaborative Chemistry Advancing Toxicology.Tox21 十库化合物库：协作化学推动毒理学发展。

Chem Res Toxicol. 2021 Feb 15;34(2):189-216. doi: 10.1021/acs.chemrestox.0c00264. Epub 2020 Nov 3.

Predicting With Confidence: Using Conformal Prediction in Drug Discovery.有信心的预测：在药物发现中使用一致性预测。

J Pharm Sci. 2021 Jan;110(1):42-49. doi: 10.1016/j.xphs.2020.09.055. Epub 2020 Oct 17.

An Overview of Machine Learning and Big Data for Drug Toxicity Evaluation.用于药物毒性评估的机器学习与大数据概述

Chem Res Toxicol. 2020 Jan 21;33(1):20-37. doi: 10.1021/acs.chemrestox.9b00227. Epub 2019 Nov 22.

Quality Controls in Ligand Binding Assays: Recommendations and Best Practices for Preparation, Qualification, Maintenance of Lot to Lot Consistency, and Prevention of Assay Drift.配体结合分析的质量控制：关于准备、鉴定、批间一致性维护以及防止分析漂移的建议和最佳实践。

AAPS J. 2019 Jul 11;21(5):89. doi: 10.1208/s12248-019-0354-6.

Machine Learning in Drug Discovery.药物研发中的机器学习

J Chem Inf Model. 2019 Mar 25;59(3):945-946. doi: 10.1021/acs.jcim.9b00136.

QSAR modelling of a large imbalanced aryl hydrocarbon activation dataset by rational and random sampling and screening of 80,086 REACH pre-registered and/or registered substances.通过对 80086 份 REACH 预注册和/或已注册物质进行合理和随机采样、筛选，对大型不平衡芳烃激活数据集进行定量构效关系建模。

PLoS One. 2019 Mar 14;14(3):e0213848. doi: 10.1371/journal.pone.0213848. eCollection 2019.

Development of an Infrastructure for the Prediction of Biological Endpoints in Industrial Environments. Lessons Learned at the eTOX Project.工业环境中生物终点预测基础设施的开发。eTOX项目的经验教训。

Front Pharmacol. 2018 Oct 11;9:1147. doi: 10.3389/fphar.2018.01147. eCollection 2018.

Evaluating parameters for ligand-based modeling with random forest on sparse data sets.在稀疏数据集上使用随机森林评估基于配体建模的参数。

J Cheminform. 2018 Oct 11;10(1):49. doi: 10.1186/s13321-018-0304-9.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

使用共形预测评估毒理学体外模型中的校准。

Assessing the calibration in toxicological in vitro models with conformal prediction.

作者信息

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献