Suppr超能文献

通过逆估计和贝叶斯推理从机器学习模型进行数据重建。

Data reconstruction from machine learning models via inverse estimation and Bayesian inference.

作者信息

Hartoyo Agus, Ciupek Dominika, Malawski Maciej, Crimi Alessandro

机构信息

Sano Centre for Computational Medicine, Kraków, Poland.

Telkom University, School of Computing, Bandung, Indonesia.

出版信息

Sci Rep. 2025 Apr 22;15(1):13856. doi: 10.1038/s41598-025-96215-z.

Abstract

This study explores the task of data reconstruction from machine learning models via inverse estimation and Bayesian inference, with the goal of recovering the original dataset solely based on the trained model. We introduce a novel theoretical framework that investigates the factors affecting the data reconstruction quality. Specifically, we derive expressions that quantify how variations in key variables influence the divergence between true and estimated posteriors by examining the concurrent behavior of their partial derivatives with respect to independent variables. This derivative-based approach establishes theoretical correlations between the variables, demonstrating that the fidelity of the recovered data is governed by two primary factors: (1) the accuracy of the assumed prior, and (2) the accuracy of the machine learning model. Empirical results across multiple benchmark datasets and machine learning algorithms corroborate these theoretical predictions, reinforcing the validity and robustness of our theoretical framework. Practically, our data reconstruction method enables the creation of synthetic models that closely replicate the performance of the original models. This work contributes to advancing the theoretical understanding and practical techniques for data reconstruction and model introspection within the context of machine learning.

摘要

本研究通过逆估计和贝叶斯推理探索从机器学习模型进行数据重建的任务,目标是仅基于训练好的模型恢复原始数据集。我们引入了一个新颖的理论框架,该框架研究影响数据重建质量的因素。具体而言,我们通过检查关键变量的偏导数相对于自变量的并发行为,推导出量化关键变量的变化如何影响真实后验与估计后验之间差异的表达式。这种基于导数的方法建立了变量之间的理论相关性,表明恢复数据的保真度由两个主要因素决定:(1)假设先验的准确性,以及(2)机器学习模型的准确性。跨多个基准数据集和机器学习算法的实证结果证实了这些理论预测,加强了我们理论框架的有效性和稳健性。实际上,我们的数据重建方法能够创建与原始模型性能紧密复制的合成模型。这项工作有助于推进机器学习背景下数据重建和模型内省的理论理解和实用技术。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/07e0/12015503/f106bf7ccca3/41598_2025_96215_Fig1_HTML.jpg

相似文献

1
Data reconstruction from machine learning models via inverse estimation and Bayesian inference.
Sci Rep. 2025 Apr 22;15(1):13856. doi: 10.1038/s41598-025-96215-z.
2
Causal Artificial Intelligence Models of Food Quality Data.
Food Technol Biotechnol. 2024 Mar;62(1):102-109. doi: 10.17113/ftb.62.01.24.8301.
3
Regularization, Bayesian Inference, and Machine Learning Methods for Inverse Problems.
Entropy (Basel). 2021 Dec 13;23(12):1673. doi: 10.3390/e23121673.
4
Can Predictive Modeling Tools Identify Patients at High Risk of Prolonged Opioid Use After ACL Reconstruction?
Clin Orthop Relat Res. 2020 Jul;478(7):0-1618. doi: 10.1097/CORR.0000000000001251.
7
GAN-based data reconstruction attacks in split learning.
Neural Netw. 2025 May;185:107150. doi: 10.1016/j.neunet.2025.107150. Epub 2025 Jan 16.
8
Posterior temperature optimized Bayesian models for inverse problems in medical imaging.
Med Image Anal. 2022 May;78:102382. doi: 10.1016/j.media.2022.102382. Epub 2022 Feb 11.
9
DREAMER: a computational framework to evaluate readiness of datasets for machine learning.
BMC Med Inform Decis Mak. 2024 Jun 4;24(1):152. doi: 10.1186/s12911-024-02544-w.
10
Sparse inference and active learning of stochastic differential equations from data.
Sci Rep. 2022 Dec 15;12(1):21691. doi: 10.1038/s41598-022-25638-9.

本文引用的文献

1
Bayesian inference of a spectral graph model for brain oscillations.
Neuroimage. 2023 Oct 1;279:120278. doi: 10.1016/j.neuroimage.2023.120278. Epub 2023 Jul 27.
2
Optimal control methods for nonlinear parameter estimation in biophysical neuron models.
PLoS Comput Biol. 2022 Sep 15;18(9):e1010479. doi: 10.1371/journal.pcbi.1010479. eCollection 2022 Sep.
3
Text Data Augmentation for Deep Learning.
J Big Data. 2021;8(1):101. doi: 10.1186/s40537-021-00492-0. Epub 2021 Jul 19.
4
Inferring a simple mechanism for alpha-blocking by fitting a neural population model to EEG spectra.
PLoS Comput Biol. 2020 Apr 30;16(4):e1007662. doi: 10.1371/journal.pcbi.1007662. eCollection 2020 Apr.
5
Parameter estimation and identifiability in a neural population model for electro-cortical activity.
PLoS Comput Biol. 2019 May 30;15(5):e1006694. doi: 10.1371/journal.pcbi.1006694. eCollection 2019 May.
6
Philosophy and the practice of Bayesian statistics.
Br J Math Stat Psychol. 2013 Feb;66(1):8-38. doi: 10.1111/j.2044-8317.2011.02037.x. Epub 2012 Feb 24.
7
Parameter estimation and model selection in computational biology.
PLoS Comput Biol. 2010 Mar 5;6(3):e1000696. doi: 10.1371/journal.pcbi.1000696.
8
The Berger rhythm: potential changes from the occipital lobes in man.
Brain. 2010 Jan;133(Pt 1):3-6. doi: 10.1093/brain/awp324.
9
Estimation of neurophysiological parameters from the waking EEG using a biophysical model of brain dynamics.
J Theor Biol. 2004 Dec 7;231(3):413-33. doi: 10.1016/j.jtbi.2004.07.004.

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验