Suppr超能文献

贝叶斯网络在生成合成健康数据中的应用。

Application of Bayesian networks to generate synthetic health data.

机构信息

Massachusetts Institute of Technology, Cambridge, Massachusetts, USA.

Rochester Institute of Technology, Rochester, New York, USA.

出版信息

J Am Med Inform Assoc. 2021 Mar 18;28(4):801-811. doi: 10.1093/jamia/ocaa303.

Abstract

OBJECTIVE

This study seeks to develop a fully automated method of generating synthetic data from a real dataset that could be employed by medical organizations to distribute health data to researchers, reducing the need for access to real data. We hypothesize the application of Bayesian networks will improve upon the predominant existing method, medBGAN, in handling the complexity and dimensionality of healthcare data.

MATERIALS AND METHODS

We employed Bayesian networks to learn probabilistic graphical structures and simulated synthetic patient records from the learned structure. We used the University of California Irvine (UCI) heart disease and diabetes datasets as well as the MIMIC-III diagnoses database. We evaluated our method through statistical tests, machine learning tasks, preservation of rare events, disclosure risk, and the ability of a machine learning classifier to discriminate between the real and synthetic data.

RESULTS

Our Bayesian network model outperformed or equaled medBGAN in all key metrics. Notable improvement was achieved in capturing rare variables and preserving association rules.

DISCUSSION

Bayesian networks generated data sufficiently similar to the original data with minimal risk of disclosure, while offering additional transparency, computational efficiency, and capacity to handle more data types in comparison to existing methods. We hope this method will allow healthcare organizations to efficiently disseminate synthetic health data to researchers, enabling them to generate hypotheses and develop analytical tools.

CONCLUSION

We conclude the application of Bayesian networks is a promising option for generating realistic synthetic health data that preserves the features of the original data without compromising data privacy.

摘要

目的

本研究旨在开发一种从真实数据集生成合成数据的全自动方法,该方法可供医疗组织将健康数据分发给研究人员使用,从而减少对真实数据的需求。我们假设贝叶斯网络的应用将改善现有的主要方法 medBGAN,以处理医疗保健数据的复杂性和维度。

材料与方法

我们使用贝叶斯网络学习概率图形结构,并从学习的结构中模拟合成的患者记录。我们使用了加利福尼亚大学欧文分校(UCI)心脏病和糖尿病数据集以及 MIMIC-III 诊断数据库。我们通过统计检验、机器学习任务、稀有事件的保留、披露风险以及机器学习分类器区分真实数据和合成数据的能力来评估我们的方法。

结果

我们的贝叶斯网络模型在所有关键指标上均优于或等同于 medBGAN。在捕获稀有变量和保留关联规则方面取得了显著的改进。

讨论

贝叶斯网络生成的数据与原始数据足够相似,披露风险最小,同时提供了额外的透明度、计算效率以及与现有方法相比处理更多数据类型的能力。我们希望这种方法将使医疗保健组织能够有效地向研究人员分发合成健康数据,使他们能够生成假设并开发分析工具。

结论

我们得出结论,贝叶斯网络的应用是生成真实合成健康数据的一种有前途的方法,该方法可以在不损害数据隐私的情况下保留原始数据的特征。

相似文献

3
Anonymization Through Data Synthesis Using Generative Adversarial Networks (ADS-GAN).基于生成对抗网络的数据合成匿名化(ADS-GAN)。
IEEE J Biomed Health Inform. 2020 Aug;24(8):2378-2388. doi: 10.1109/JBHI.2020.2980262. Epub 2020 Mar 12.
7
Impact of censoring on learning Bayesian networks in survival modelling.生存模型中删失数据对贝叶斯网络学习的影响。
Artif Intell Med. 2009 Nov;47(3):199-217. doi: 10.1016/j.artmed.2009.08.001. Epub 2009 Oct 14.

引用本文的文献

6
Large language models and synthetic health data: progress and prospects.大语言模型与合成健康数据:进展与前景
JAMIA Open. 2024 Oct 26;7(4):ooae114. doi: 10.1093/jamiaopen/ooae114. eCollection 2024 Dec.

本文引用的文献

2
Generation and evaluation of synthetic patient data.生成和评估合成患者数据。
BMC Med Res Methodol. 2020 May 7;20(1):108. doi: 10.1186/s12874-020-00977-1.
4
The potential for artificial intelligence in healthcare.人工智能在医疗保健领域的潜力。
Future Healthc J. 2019 Jun;6(2):94-98. doi: 10.7861/futurehosp.6-2-94.
8
Machine learning in medicine: Addressing ethical challenges.机器学习在医学中的应用:应对伦理挑战。
PLoS Med. 2018 Nov 6;15(11):e1002689. doi: 10.1371/journal.pmed.1002689. eCollection 2018 Nov.

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验