• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

基于合成数据的医学研究结果分析及其与真实数据结果的关系:五项观察性研究的系统比较

Analyzing Medical Research Results Based on Synthetic Data and Their Relation to Real Data Results: Systematic Comparison From Five Observational Studies.

作者信息

Reiner Benaim Anat, Almog Ronit, Gorelik Yuri, Hochberg Irit, Nassar Laila, Mashiach Tanya, Khamaisi Mogher, Lurie Yael, Azzam Zaher S, Khoury Johad, Kurnik Daniel, Beyar Rafael

机构信息

Clinical Epidemiology Unit, Rambam Health Care Campus, Haifa, Israel.

School of Public Health, University of Haifa, Haifa, Israel.

出版信息

JMIR Med Inform. 2020 Feb 20;8(2):e16492. doi: 10.2196/16492.

DOI:10.2196/16492
PMID:32130148
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC7059086/
Abstract

BACKGROUND

Privacy restrictions limit access to protected patient-derived health information for research purposes. Consequently, data anonymization is required to allow researchers data access for initial analysis before granting institutional review board approval. A system installed and activated at our institution enables synthetic data generation that mimics data from real electronic medical records, wherein only fictitious patients are listed.

OBJECTIVE

This paper aimed to validate the results obtained when analyzing synthetic structured data for medical research. A comprehensive validation process concerning meaningful clinical questions and various types of data was conducted to assess the accuracy and precision of statistical estimates derived from synthetic patient data.

METHODS

A cross-hospital project was conducted to validate results obtained from synthetic data produced for five contemporary studies on various topics. For each study, results derived from synthetic data were compared with those based on real data. In addition, repeatedly generated synthetic datasets were used to estimate the bias and stability of results obtained from synthetic data.

RESULTS

This study demonstrated that results derived from synthetic data were predictive of results from real data. When the number of patients was large relative to the number of variables used, highly accurate and strongly consistent results were observed between synthetic and real data. For studies based on smaller populations that accounted for confounders and modifiers by multivariate models, predictions were of moderate accuracy, yet clear trends were correctly observed.

CONCLUSIONS

The use of synthetic structured data provides a close estimate to real data results and is thus a powerful tool in shaping research hypotheses and accessing estimated analyses, without risking patient privacy. Synthetic data enable broad access to data (eg, for out-of-organization researchers), and rapid, safe, and repeatable analysis of data in hospitals or other health organizations where patient privacy is a primary value.

摘要

背景

隐私限制阻碍了出于研究目的获取受保护的患者健康信息。因此,在获得机构审查委员会批准之前,需要对数据进行匿名化处理,以便研究人员能够访问数据进行初步分析。我们机构安装并启用的一个系统能够生成模拟真实电子病历数据的合成数据,其中仅列出虚拟患者。

目的

本文旨在验证在医学研究中分析合成结构化数据时获得的结果。针对有意义的临床问题和各种类型的数据进行了全面的验证过程,以评估从合成患者数据得出的统计估计的准确性和精确性。

方法

开展了一个跨医院项目,以验证从为五项关于不同主题的当代研究生成的合成数据中获得的结果。对于每项研究,将合成数据得出的结果与基于真实数据的结果进行比较。此外,使用反复生成的合成数据集来估计从合成数据获得的结果的偏差和稳定性。

结果

本研究表明,合成数据得出的结果可预测真实数据的结果。当患者数量相对于所使用的变量数量较多时,合成数据与真实数据之间观察到高度准确且高度一致的结果。对于基于较小样本量且通过多变量模型考虑了混杂因素和修正因素的研究,预测具有中等准确性,但正确观察到了明显趋势。

结论

使用合成结构化数据能够提供与真实数据结果相近的估计,因此是形成研究假设和进行估计分析的有力工具,同时不会危及患者隐私。合成数据使广泛的数据访问成为可能(例如,对于机构外的研究人员),并且能够在患者隐私是首要价值的医院或其他卫生组织中对数据进行快速、安全且可重复的分析。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8ea0/7059086/95fcf7013045/medinform_v8i2e16492_fig8.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8ea0/7059086/031b6ee821af/medinform_v8i2e16492_fig1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8ea0/7059086/8f285e377c26/medinform_v8i2e16492_fig2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8ea0/7059086/a3c26c2840b7/medinform_v8i2e16492_fig3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8ea0/7059086/112b4fd280c7/medinform_v8i2e16492_fig4.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8ea0/7059086/e2584838311e/medinform_v8i2e16492_fig5.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8ea0/7059086/485fc2ee4b54/medinform_v8i2e16492_fig6.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8ea0/7059086/3e02304758d4/medinform_v8i2e16492_fig7.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8ea0/7059086/95fcf7013045/medinform_v8i2e16492_fig8.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8ea0/7059086/031b6ee821af/medinform_v8i2e16492_fig1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8ea0/7059086/8f285e377c26/medinform_v8i2e16492_fig2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8ea0/7059086/a3c26c2840b7/medinform_v8i2e16492_fig3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8ea0/7059086/112b4fd280c7/medinform_v8i2e16492_fig4.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8ea0/7059086/e2584838311e/medinform_v8i2e16492_fig5.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8ea0/7059086/485fc2ee4b54/medinform_v8i2e16492_fig6.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8ea0/7059086/3e02304758d4/medinform_v8i2e16492_fig7.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8ea0/7059086/95fcf7013045/medinform_v8i2e16492_fig8.jpg

相似文献

1
Analyzing Medical Research Results Based on Synthetic Data and Their Relation to Real Data Results: Systematic Comparison From Five Observational Studies.基于合成数据的医学研究结果分析及其与真实数据结果的关系:五项观察性研究的系统比较
JMIR Med Inform. 2020 Feb 20;8(2):e16492. doi: 10.2196/16492.
2
Translational Metabolomics of Head Injury: Exploring Dysfunctional Cerebral Metabolism with Ex Vivo NMR Spectroscopy-Based Metabolite Quantification头部损伤的转化代谢组学:基于体外核磁共振波谱的代谢物定量分析探索脑代谢功能障碍
3
Synthetic data in cancer and cerebrovascular disease research: A novel approach to big data.癌症和脑血管病研究中的合成数据:大数据的一种新方法。
PLoS One. 2024 Feb 7;19(2):e0295921. doi: 10.1371/journal.pone.0295921. eCollection 2024.
4
Spot the difference: comparing results of analyses from real patient data and synthetic derivatives.找出差异:比较来自真实患者数据和合成衍生物的分析结果。
JAMIA Open. 2020 Dec 14;3(4):557-566. doi: 10.1093/jamiaopen/ooaa060. eCollection 2020 Dec.
5
Creating High-Quality Synthetic Health Data: Framework for Model Development and Validation.创建高质量合成健康数据:模型开发与验证框架。
JMIR Form Res. 2024 Apr 22;8:e53241. doi: 10.2196/53241.
6
Privacy-preserving data cube for electronic medical records: An experimental evaluation.用于电子病历的隐私保护数据立方体:实验评估
Int J Med Inform. 2017 Jan;97:33-42. doi: 10.1016/j.ijmedinf.2016.09.008. Epub 2016 Sep 24.
7
Generating synthetic mixed discrete-continuous health records with mixed sum-product networks.用混合和积网络生成混合离散连续的健康记录。
J Am Med Inform Assoc. 2022 Dec 13;30(1):16-25. doi: 10.1093/jamia/ocac184.
8
The Problem of Fairness in Synthetic Healthcare Data.合成医疗数据中的公平性问题。
Entropy (Basel). 2021 Sep 4;23(9):1165. doi: 10.3390/e23091165.
9
Effects of short-term exposure to air pollution on hospital admissions of young children for acute lower respiratory infections in Ho Chi Minh City, Vietnam.越南胡志明市短期暴露于空气污染对幼儿急性下呼吸道感染住院率的影响。
Res Rep Health Eff Inst. 2012 Jun(169):5-72; discussion 73-83.
10
A method for generating synthetic longitudinal health data.一种生成合成纵向健康数据的方法。
BMC Med Res Methodol. 2023 Mar 23;23(1):67. doi: 10.1186/s12874-023-01869-w.

引用本文的文献

1
Medical data sharing and synthetic clinical data generation - maximizing biomedical resource utilization and minimizing participant re-identification risks.医学数据共享与合成临床数据生成——最大化生物医学资源利用并最小化参与者重新识别风险。
NPJ Digit Med. 2025 Aug 16;8(1):526. doi: 10.1038/s41746-025-01935-1.
2
Treatment disparities and prognostic implications in octogenarians versus non-octogenarians with high-gradient severe aortic stenosis.高龄与非高龄高梯度重度主动脉瓣狭窄患者的治疗差异及预后影响
Open Heart. 2025 Aug 14;12(2):e003405. doi: 10.1136/openhrt-2025-003405.
3
A machine learning approach for diagnostic and prognostic predictions, key risk factors and interactions.

本文引用的文献

1
Ensuring electronic medical record simulation through better training, modeling, and evaluation.通过更好的培训、建模和评估来确保电子病历模拟。
J Am Med Inform Assoc. 2020 Jan 1;27(1):99-108. doi: 10.1093/jamia/ocz161.
2
Blood urea nitrogen variation upon admission and at discharge in patients with heart failure.入院时和出院时心力衰竭患者的血尿素氮变化。
ESC Heart Fail. 2019 Aug;6(4):809-816. doi: 10.1002/ehf2.12471. Epub 2019 Jun 14.
3
The validity of synthetic clinical data: a validation study of a leading synthetic data generator (Synthea) using clinical quality measures.
一种用于诊断和预后预测、关键风险因素及相互作用的机器学习方法。
Health Serv Outcomes Res Methodol. 2025 Mar;25(1):1-28. doi: 10.1007/s10742-024-00324-7. Epub 2024 Mar 18.
4
Differences in technical and clinical perspectives on AI validation in cancer imaging: mind the gap!癌症成像中人工智能验证在技术和临床视角上的差异:注意差距!
Eur Radiol Exp. 2025 Jan 15;9(1):7. doi: 10.1186/s41747-024-00543-0.
5
Decades in the Making: The Evolution of Digital Health Research Infrastructure Through Synthetic Data, Common Data Models, and Federated Learning.数十年磨一剑:通过合成数据、通用数据模型和联邦学习实现数字健康研究基础设施的演进
J Med Internet Res. 2024 Dec 20;26:e58637. doi: 10.2196/58637.
6
Actionability of Synthetic Data in a Heterogeneous and Rare Health Care Demographic: Adolescents and Young Adults With Cancer.合成数据在异质性和罕见医疗保健人群中的适用性:患有癌症的青少年和青年成年人
JCO Clin Cancer Inform. 2024 Dec;8:e2400056. doi: 10.1200/CCI.24.00056. Epub 2024 Dec 3.
7
Trends in Patient Characteristics and Cardiothoracic Surgeries over 14 Years (2010-2023): A Single Center Experience.14年(2010 - 2023年)间患者特征及心胸外科手术的趋势:单中心经验
J Clin Med. 2024 Oct 28;13(21):6467. doi: 10.3390/jcm13216467.
8
Large language models and synthetic health data: progress and prospects.大语言模型与合成健康数据:进展与前景
JAMIA Open. 2024 Oct 26;7(4):ooae114. doi: 10.1093/jamiaopen/ooae114. eCollection 2024 Dec.
9
Early Detection of Pulmonary Embolism in a General Patient Population Immediately Upon Hospital Admission Using Machine Learning to Identify New, Unidentified Risk Factors: Model Development Study.基于机器学习识别新的、未被识别的危险因素,在患者入院时即时对普通患者人群进行肺栓塞的早期检测:模型开发研究。
J Med Internet Res. 2024 Jul 30;26:e48595. doi: 10.2196/48595.
10
The CRP troponin test (CTT) stratifies mortality risk in patients with non-ST elevation myocardial infarction (NSTEMI).C 反应蛋白肌钙蛋白 T 检测(CTT)可对非 ST 段抬高型心肌梗死(NSTEMI)患者的死亡风险进行分层。
Clin Cardiol. 2024 Apr;47(4):e24256. doi: 10.1002/clc.24256.
合成临床数据的有效性:使用临床质量指标对领先的合成数据生成器(Synthea)进行验证研究。
BMC Med Inform Decis Mak. 2019 Mar 14;19(1):44. doi: 10.1186/s12911-019-0793-0.
4
Negligible Risk of Acute Renal Failure Among Hospitalized Patients After Contrast-Enhanced Imaging With Iodinated Versus Gadolinium-Based Agents.住院患者在接受碘对比剂与钆对比剂增强成像后发生急性肾衰竭的风险可忽略不计。
Invest Radiol. 2019 May;54(5):312-318. doi: 10.1097/RLI.0000000000000534.
5
Insulin Detemir Use Is Associated With Higher Occurrence of Hypoglycemia in Hospitalized Patients With Hypoalbuminemia.在低白蛋白血症住院患者中,使用德谷胰岛素与低血糖发生率较高相关。
Diabetes Care. 2018 Apr;41(4):e44-e46. doi: 10.2337/dc17-1957. Epub 2018 Feb 1.
6
Synthea: An approach, method, and software mechanism for generating synthetic patients and the synthetic electronic health care record.Synthea:一种用于生成合成患者及合成电子健康记录的方法、手段和软件机制。
J Am Med Inform Assoc. 2018 Mar 1;25(3):230-238. doi: 10.1093/jamia/ocx079.
7
2017 ESC focused update on dual antiplatelet therapy in coronary artery disease developed in collaboration with EACTS: The Task Force for dual antiplatelet therapy in coronary artery disease of the European Society of Cardiology (ESC) and of the European Association for Cardio-Thoracic Surgery (EACTS).2017年欧洲心脏病学会(ESC)与欧洲心胸外科学会(EACTS)合作制定的冠状动脉疾病双联抗血小板治疗重点更新:欧洲心脏病学会(ESC)和欧洲心胸外科学会(EACTS)冠状动脉疾病双联抗血小板治疗特别工作组。
Eur Heart J. 2018 Jan 14;39(3):213-260. doi: 10.1093/eurheartj/ehx419.
8
Acute Myocardial Infarction.急性心肌梗死
N Engl J Med. 2017 May 25;376(21):2053-2064. doi: 10.1056/NEJMra1606915.
9
Risk of Acute Kidney Injury After Intravenous Contrast Media Administration.静脉注射造影剂后急性肾损伤的风险
Ann Emerg Med. 2017 May;69(5):577-586.e4. doi: 10.1016/j.annemergmed.2016.11.021. Epub 2017 Jan 25.
10
An Evaluation of Two Methods for Generating Synthetic HL7 Segments Reflecting Real-World Health Information Exchange Transactions.两种生成反映真实世界健康信息交换交易的合成HL7段方法的评估。
AMIA Annu Symp Proc. 2014 Nov 14;2014:1855-63. eCollection 2014.