Suppr超能文献

使用共形预测评估毒理学体外模型中的校准。

Assessing the calibration in toxicological in vitro models with conformal prediction.

作者信息

Morger Andrea, Svensson Fredrik, Arvidsson McShane Staffan, Gauraha Niharika, Norinder Ulf, Spjuth Ola, Volkamer Andrea

机构信息

In Silico Toxicology and Structural Bioinformatics, Institute of Physiology, Charité Universitätsmedizin, Berlin, Germany.

Alzheimer's Research UK UCL Drug Discovery Institute, London, WC1E 6BT, UK.

出版信息

J Cheminform. 2021 Apr 29;13(1):35. doi: 10.1186/s13321-021-00511-5.

Abstract

Machine learning methods are widely used in drug discovery and toxicity prediction. While showing overall good performance in cross-validation studies, their predictive power (often) drops in cases where the query samples have drifted from the training data's descriptor space. Thus, the assumption for applying machine learning algorithms, that training and test data stem from the same distribution, might not always be fulfilled. In this work, conformal prediction is used to assess the calibration of the models. Deviations from the expected error may indicate that training and test data originate from different distributions. Exemplified on the Tox21 datasets, composed of chronologically released Tox21Train, Tox21Test and Tox21Score subsets, we observed that while internally valid models could be trained using cross-validation on Tox21Train, predictions on the external Tox21Score data resulted in higher error rates than expected. To improve the prediction on the external sets, a strategy exchanging the calibration set with more recent data, such as Tox21Test, has successfully been introduced. We conclude that conformal prediction can be used to diagnose data drifts and other issues related to model calibration. The proposed improvement strategy-exchanging the calibration data only-is convenient as it does not require retraining of the underlying model.

摘要

机器学习方法在药物发现和毒性预测中被广泛应用。虽然在交叉验证研究中总体表现良好,但在查询样本偏离训练数据的描述符空间的情况下,它们的预测能力(通常)会下降。因此,应用机器学习算法的假设,即训练数据和测试数据来自相同的分布,可能并不总是成立。在这项工作中,共形预测用于评估模型的校准。与预期误差的偏差可能表明训练数据和测试数据来自不同的分布。以Tox21数据集为例,该数据集由按时间顺序发布的Tox21Train、Tox21Test和Tox21Score子集组成,我们观察到,虽然可以使用Tox21Train上的交叉验证训练内部有效的模型,但对外部Tox21Score数据的预测导致错误率高于预期。为了改进对外部集的预测,已经成功引入了一种将校准集与更新的数据(如Tox21Test)交换的策略。我们得出结论,共形预测可用于诊断数据漂移和与模型校准相关的其他问题。所提出的改进策略——仅交换校准数据——很方便,因为它不需要重新训练基础模型。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/61e2/8082859/de35daddfd77/13321_2021_511_Fig1_HTML.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验