找出差异：比较来自真实患者数据和合成衍生物的分析结果。

Spot the difference: comparing results of analyses from real patient data and synthetic derivatives.

作者信息

Foraker Randi E, Yu Sean C, Gupta Aditi, Michelson Andrew P, Pineda Soto Jose A, Colvin Ryan, Loh Francis, Kollef Marin H, Maddox Thomas, Evanoff Bradley, Dror Hovav, Zamstein Noa, Lai Albert M, Payne Philip R O

机构信息

Division of General Medical Sciences, Department of Medicine, School of Medicine, Washington University in St. Louis, St. Louis, Missouri, USA.

Department of Medicine, Institute for Informatics, School of Medicine, Washington University in St. Louis, St. Louis, Missouri, USA.

出版信息

JAMIA Open. 2020 Dec 14;3(4):557-566. doi: 10.1093/jamiaopen/ooaa060. eCollection 2020 Dec.

DOI:10.1093/jamiaopen/ooaa060

PMID:33623891

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC7886551/

Abstract

BACKGROUND

Synthetic data may provide a solution to researchers who wish to generate and share data in support of precision healthcare. Recent advances in data synthesis enable the creation and analysis of synthetic derivatives as if they were the original data; this process has significant advantages over data deidentification.

OBJECTIVES

To assess a big-data platform with data-synthesizing capabilities (MDClone Ltd., Beer Sheva, Israel) for its ability to produce data that can be used for research purposes while obviating privacy and confidentiality concerns.

METHODS

We explored three use cases and tested the robustness of synthetic data by comparing the results of analyses using synthetic derivatives to analyses using the original data using traditional statistics, machine learning approaches, and spatial representations of the data. We designed these use cases with the purpose of conducting analyses at the observation level (Use Case 1), patient cohorts (Use Case 2), and population-level data (Use Case 3).

RESULTS

For each use case, the results of the analyses were sufficiently statistically similar ( > 0.05) between the synthetic derivative and the real data to draw the same conclusions.

DISCUSSION AND CONCLUSION

This article presents the results of each use case and outlines key considerations for the use of synthetic data, examining their role in clinical research for faster insights and improved data sharing in support of precision healthcare.

摘要

背景

合成数据可能为希望生成和共享数据以支持精准医疗的研究人员提供一种解决方案。数据合成的最新进展使得能够创建和分析合成衍生物，就好像它们是原始数据一样；这一过程相对于数据去识别具有显著优势。

目的

评估一个具有数据合成能力的大数据平台（MDClone有限公司，以色列贝尔谢巴）生成可用于研究目的的数据的能力，同时消除隐私和保密问题。

方法

我们探讨了三个用例，并通过使用传统统计、机器学习方法以及数据的空间表示，将使用合成衍生物的分析结果与使用原始数据的分析结果进行比较，来测试合成数据的稳健性。我们设计这些用例的目的是在观察层面（用例1）、患者队列（用例2）和人群层面数据（用例3）进行分析。

结果

对于每个用例，合成衍生物与真实数据之间的分析结果在统计学上具有足够的相似性（>0.05），从而能够得出相同的结论。

讨论与结论

本文展示了每个用例的结果，并概述了使用合成数据的关键考虑因素，探讨了它们在临床研究中的作用，以便更快地获得见解并改善数据共享，以支持精准医疗。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/148c/7886551/331dfb4ec0a9/ooaa060f1.jpg

相似文献

Spot the difference: comparing results of analyses from real patient data and synthetic derivatives.

JAMIA Open. 2020 Dec 14;3(4):557-566. doi: 10.1093/jamiaopen/ooaa060. eCollection 2020 Dec.

The National COVID Cohort Collaborative: Analyses of Original and Computationally Derived Electronic Health Record Data.

J Med Internet Res. 2021 Oct 4;23(10):e30697. doi: 10.2196/30697.

Folic acid supplementation and malaria susceptibility and severity among people taking antifolate antimalarial drugs in endemic areas.

Cochrane Database Syst Rev. 2022 Feb 1;2(2022):CD014217. doi: 10.1002/14651858.CD014217.

The future of Cochrane Neonatal.

Early Hum Dev. 2020 Nov;150:105191. doi: 10.1016/j.earlhumdev.2020.105191. Epub 2020 Sep 12.

Creating High-Quality Synthetic Health Data: Framework for Model Development and Validation.

JMIR Form Res. 2024 Apr 22;8:e53241. doi: 10.2196/53241.

Analyzing Medical Research Results Based on Synthetic Data and Their Relation to Real Data Results: Systematic Comparison From Five Observational Studies.

JMIR Med Inform. 2020 Feb 20;8(2):e16492. doi: 10.2196/16492.

Web-Based Application Based on Human-in-the-Loop Deep Learning for Deidentifying Free-Text Data in Electronic Medical Records: Development and Usability Study.

Interact J Med Res. 2023 Aug 25;12:e46322. doi: 10.2196/46322.

Privacy-Preserving Generative Deep Neural Networks Support Clinical Data Sharing.

Circ Cardiovasc Qual Outcomes. 2019 Jul;12(7):e005122. doi: 10.1161/CIRCOUTCOMES.118.005122. Epub 2019 Jul 9.

The project data sphere initiative: accelerating cancer research by sharing data.

Oncologist. 2015 May;20(5):464-e20. doi: 10.1634/theoncologist.2014-0431. Epub 2015 Apr 15.

A method for generating synthetic longitudinal health data.

BMC Med Res Methodol. 2023 Mar 23;23(1):67. doi: 10.1186/s12874-023-01869-w.

引用本文的文献

Statin Treatment for Reducing Mortality Risk in Individuals over 75 Years of Age: A Large-Scale Retrospective Analysis.

J Clin Med. 2025 Aug 14;14(16):5739. doi: 10.3390/jcm14165739.

Treatment disparities and prognostic implications in octogenarians versus non-octogenarians with high-gradient severe aortic stenosis.

Open Heart. 2025 Aug 14;12(2):e003405. doi: 10.1136/openhrt-2025-003405.

Can Synthetic Data Allow for Smaller Sample Sizes in Chronic Urticaria Research?

Clin Transl Allergy. 2025 Aug;15(8):e70087. doi: 10.1002/clt2.70087.

Clinical Research Informatics: a Decade-in-Review.

Yearb Med Inform. 2024 Aug;33(1):127-142. doi: 10.1055/s-0044-1800732. Epub 2025 Apr 8.

Big data for neuroscience in the context of predictive, preventive, and personalized medicine.

EPMA J. 2024 Dec 23;16(1):17-35. doi: 10.1007/s13167-024-00393-1. eCollection 2025 Mar.

A scoping review of privacy and utility metrics in medical synthetic data.

NPJ Digit Med. 2025 Jan 27;8(1):60. doi: 10.1038/s41746-024-01359-3.

Actionability of Synthetic Data in a Heterogeneous and Rare Health Care Demographic: Adolescents and Young Adults With Cancer.

JCO Clin Cancer Inform. 2024 Dec;8:e2400056. doi: 10.1200/CCI.24.00056. Epub 2024 Dec 3.

Synthetic data and ELSI-focused computational checklists-A survey of biomedical professionals' views.

PLOS Digit Health. 2024 Nov 20;3(11):e0000666. doi: 10.1371/journal.pdig.0000666. eCollection 2024 Nov.

Large language models and synthetic health data: progress and prospects.

JAMIA Open. 2024 Oct 26;7(4):ooae114. doi: 10.1093/jamiaopen/ooae114. eCollection 2024 Dec.

Current Applications and Future Implications of Artificial Intelligence in Spine Surgery and Research: A Narrative Review and Commentary.

Global Spine J. 2025 Mar;15(2):1445-1454. doi: 10.1177/21925682241290752. Epub 2024 Oct 2.

本文引用的文献

Generation and evaluation of synthetic patient data.

BMC Med Res Methodol. 2020 May 7;20(1):108. doi: 10.1186/s12874-020-00977-1.

Analyzing Medical Research Results Based on Synthetic Data and Their Relation to Real Data Results: Systematic Comparison From Five Observational Studies.

JMIR Med Inform. 2020 Feb 20;8(2):e16492. doi: 10.2196/16492.

The validity of synthetic clinical data: a validation study of a leading synthetic data generator (Synthea) using clinical quality measures.

BMC Med Inform Decis Mak. 2019 Mar 14;19(1):44. doi: 10.1186/s12911-019-0793-0.

Synthesizing electronic health records using improved generative adversarial networks.

J Am Med Inform Assoc. 2019 Mar 1;26(3):228-241. doi: 10.1093/jamia/ocy142.

Are Synthetic Data Derivatives the Future of Translational Medicine?

JACC Basic Transl Sci. 2018 Nov 12;3(5):716-718. doi: 10.1016/j.jacbts.2018.08.007. eCollection 2018 Oct.

Synthea: An approach, method, and software mechanism for generating synthetic patients and the synthetic electronic health care record.

J Am Med Inform Assoc. 2018 Mar 1;25(3):230-238. doi: 10.1093/jamia/ocx079.

Incidence and Trends of Sepsis in US Hospitals Using Clinical vs Claims Data, 2009-2014.

JAMA. 2017 Oct 3;318(13):1241-1249. doi: 10.1001/jama.2017.13836.

Hospital deaths in patients with sepsis from 2 independent cohorts.

JAMA. 2014 Jul 2;312(1):90-2. doi: 10.1001/jama.2014.5804.

Data-driven approach for creating synthetic electronic medical records.

BMC Med Inform Decis Mak. 2010 Oct 14;10:59. doi: 10.1186/1472-6947-10-59.

2001 SCCM/ESICM/ACCP/ATS/SIS International Sepsis Definitions Conference.

Intensive Care Med. 2003 Apr;29(4):530-8. doi: 10.1007/s00134-003-1662-x. Epub 2003 Mar 28.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

找出差异：比较来自真实患者数据和合成衍生物的分析结果。

Spot the difference: comparing results of analyses from real patient data and synthetic derivatives.

作者信息

机构信息

出版信息

BACKGROUND

OBJECTIVES

METHODS

RESULTS

DISCUSSION AND CONCLUSION

背景

目的

方法

结果

讨论与结论

相似文献

引用本文的文献

本文引用的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献