• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

合成数据助力医学领域的机器学习应用。

Synthetic data as an enabler for machine learning applications in medicine.

作者信息

Rajotte Jean-Francois, Bergen Robert, Buckeridge David L, El Emam Khaled, Ng Raymond, Strome Elissa

机构信息

Data Science Institute, University of British Columbia, Vancouver, BC, Canada.

McGill University and McGill University Health Centre, Montreal, QC, Canada.

出版信息

iScience. 2022 Oct 13;25(11):105331. doi: 10.1016/j.isci.2022.105331. eCollection 2022 Nov 18.

DOI:10.1016/j.isci.2022.105331
PMID:36325058
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC9619172/
Abstract

Synthetic data generation is the process of using machine learning methods to train a model that captures the patterns in a real dataset. Then new or synthetic data can be generated from that trained model. The synthetic data does not have a one-to-one mapping to the original data or to real patients, and therefore has the potential of privacy preserving properties. There is a growing interest in the application of synthetic data across health and life sciences, but to fully realize the benefits, further education, research, and policy innovation is required. This article summarizes the opportunities and challenges of SDG for health data, and provides directions for how this technology can be leveraged to accelerate data access for secondary purposes.

摘要

合成数据生成是指利用机器学习方法训练一个能够捕捉真实数据集模式的模型的过程。然后,可以从该训练模型生成新的或合成数据。合成数据与原始数据或真实患者不存在一对一映射关系,因此具有隐私保护特性。合成数据在健康和生命科学领域的应用正受到越来越多的关注,但要充分实现其益处,还需要进一步的教育、研究和政策创新。本文总结了合成数据生成在健康数据方面的机遇和挑战,并为如何利用这项技术加速二次数据访问提供了指导方向。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2c4f/9619172/45c80fe40b1c/gr3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2c4f/9619172/95e55fdde6d1/fx1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2c4f/9619172/2b80b40322a6/gr1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2c4f/9619172/583605d61dd9/gr2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2c4f/9619172/45c80fe40b1c/gr3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2c4f/9619172/95e55fdde6d1/fx1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2c4f/9619172/2b80b40322a6/gr1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2c4f/9619172/583605d61dd9/gr2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2c4f/9619172/45c80fe40b1c/gr3.jpg

相似文献

1
Synthetic data as an enabler for machine learning applications in medicine.合成数据助力医学领域的机器学习应用。
iScience. 2022 Oct 13;25(11):105331. doi: 10.1016/j.isci.2022.105331. eCollection 2022 Nov 18.
2
Opportunities and Challenges of Synthetic Data Generation in Oncology.肿瘤学中合成数据生成的机遇与挑战。
JCO Clin Cancer Inform. 2023 Aug;7:e2300045. doi: 10.1200/CCI.23.00045.
3
Demonstrating the successful application of synthetic learning in spine surgery for training multi-center models with increased patient privacy.展示了合成学习在脊柱外科中的成功应用,该方法用于训练具有更高患者隐私保护的多中心模型。
Sci Rep. 2023 Aug 1;13(1):12481. doi: 10.1038/s41598-023-39458-y.
4
Synthetic Data Generation by Artificial Intelligence to Accelerate Research and Precision Medicine in Hematology.人工智能生成合成数据以加速血液学研究和精准医学
JCO Clin Cancer Inform. 2023 Jun;7:e2300021. doi: 10.1200/CCI.23.00021.
5
Decentralised, collaborative, and privacy-preserving machine learning for multi-hospital data.去中心化、协作和保护隐私的机器学习,适用于多医院数据。
EBioMedicine. 2024 Mar;101:105006. doi: 10.1016/j.ebiom.2024.105006. Epub 2024 Feb 19.
6
Synthetic Medical Images for Robust, Privacy-Preserving Training of Artificial Intelligence: Application to Retinopathy of Prematurity Diagnosis.用于人工智能稳健、隐私保护训练的合成医学图像:在早产儿视网膜病变诊断中的应用
Ophthalmol Sci. 2022 Feb 11;2(2):100126. doi: 10.1016/j.xops.2022.100126. eCollection 2022 Jun.
7
Reliability of Supervised Machine Learning Using Synthetic Data in Health Care: Model to Preserve Privacy for Data Sharing.医疗保健中使用合成数据的监督式机器学习的可靠性:用于数据共享时保护隐私的模型
JMIR Med Inform. 2020 Jul 20;8(7):e18910. doi: 10.2196/18910.
8
Privacy preserving Generative Adversarial Networks to model Electronic Health Records.用于建模电子健康记录的隐私保护生成对抗网络。
Neural Netw. 2022 Sep;153:339-348. doi: 10.1016/j.neunet.2022.06.022. Epub 2022 Jun 25.
9
Assessment of differentially private synthetic data for utility and fairness in end-to-end machine learning pipelines for tabular data.用于表格数据的端到端机器学习管道中效用和公平性的差分隐私合成数据评估。
PLoS One. 2024 Feb 5;19(2):e0297271. doi: 10.1371/journal.pone.0297271. eCollection 2024.
10
A review on utilizing machine learning technology in the fields of electronic emergency triage and patient priority systems in telemedicine: Coherent taxonomy, motivations, open research challenges and recommendations for intelligent future work.利用机器学习技术在电子急诊分诊和远程医疗患者优先系统领域的应用综述:连贯的分类法、动机、开放的研究挑战和对智能未来工作的建议。
Comput Methods Programs Biomed. 2021 Sep;209:106357. doi: 10.1016/j.cmpb.2021.106357. Epub 2021 Aug 16.

引用本文的文献

1
Radiomics Quality Score 2.0: towards radiomics readiness levels and clinical translation for personalized medicine.放射组学质量评分2.0:迈向个性化医疗的放射组学准备水平及临床转化
Nat Rev Clin Oncol. 2025 Sep 3. doi: 10.1038/s41571-025-01067-1.
2
Tempered enthusiasm by interviewed experts for synthetic data and ELSI checklists for AI in medicine.受访专家对医学人工智能合成数据和伦理、法律与社会影响(ELSI)清单的热情有所降温。
AI Ethics. 2025;5(3):3241-3254. doi: 10.1007/s43681-024-00652-x. Epub 2025 Jan 10.
3
Digital twins, synthetic patient data, and in-silico trials: can they empower paediatric clinical trials?

本文引用的文献

1
Private measures, random walks, and synthetic data.私人措施、随机游走与合成数据。
Probab Theory Relat Fields. 2024;189(1-2):569-611. doi: 10.1007/s00440-024-01279-z. Epub 2024 Apr 20.
2
Utility Metrics for Evaluating Synthetic Health Data Generation Methods: Validation Study.用于评估合成健康数据生成方法的效用指标:验证研究
JMIR Med Inform. 2022 Apr 7;10(4):e35734. doi: 10.2196/35734.
3
Head and neck tumor segmentation in PET/CT: The HECKTOR challenge.头颈部肿瘤在 PET/CT 中的分割:HECKTOR 挑战赛。
数字孪生、合成患者数据和虚拟试验:它们能否助力儿科临床试验?
Lancet Digit Health. 2025 May;7(5):100851. doi: 10.1016/j.landig.2025.01.007. Epub 2025 May 13.
4
On the fidelity versus privacy and utility trade-off of synthetic patient data.论合成患者数据的保真度与隐私及效用之间的权衡
iScience. 2025 Apr 14;28(5):112382. doi: 10.1016/j.isci.2025.112382. eCollection 2025 May 16.
5
Validity of tremor analysis using smartphone compatible computer vision frameworks.使用与智能手机兼容的计算机视觉框架进行震颤分析的有效性。
Sci Rep. 2025 Apr 18;15(1):13391. doi: 10.1038/s41598-025-97252-4.
6
Synthetic data generation: a privacy-preserving approach to accelerate rare disease research.合成数据生成:一种加速罕见病研究的隐私保护方法。
Front Digit Health. 2025 Mar 18;7:1563991. doi: 10.3389/fdgth.2025.1563991. eCollection 2025.
7
Diffusion MRI GAN synthesizing fibre orientation distribution data using generative adversarial networks.使用生成对抗网络的扩散磁共振成像生成对抗网络合成纤维取向分布数据。
Commun Biol. 2025 Mar 28;8(1):512. doi: 10.1038/s42003-025-07936-w.
8
Synthetic data as an investigative tool in hypertension and renal diseases research.合成数据作为高血压和肾脏疾病研究中的一种调查工具。
World J Methodol. 2025 Mar 20;15(1):98626. doi: 10.5662/wjm.v15.i1.98626.
9
The Impact of Radiotherapy and Attenuated Chemotherapy Regimens in Older Patients with Classic Hodgkin Lymphoma: A Real-Life Study from the ReLLi Network.放疗和减量化化疗方案对老年经典型霍奇金淋巴瘤患者的影响:来自ReLLi网络的一项真实世界研究
Cancers (Basel). 2025 Feb 24;17(5):765. doi: 10.3390/cancers17050765.
10
AI-driven synthetic data generation for accelerating hepatology research: A study of the United Network for Organ Sharing (UNOS) database.人工智能驱动的合成数据生成以加速肝病学研究:器官共享联合网络(UNOS)数据库研究
Hepatology. 2025 Mar 11. doi: 10.1097/HEP.0000000000001299.
Med Image Anal. 2022 Apr;77:102336. doi: 10.1016/j.media.2021.102336. Epub 2021 Dec 25.
4
Data-sharing practices in publications funded by the Canadian Institutes of Health Research: a descriptive analysis.加拿大卫生研究院资助的出版物中的数据共享实践:描述性分析。
CMAJ Open. 2021 Nov 9;9(4):E980-E987. doi: 10.9778/cmajo.20200303. Print 2021 Oct-Dec.
5
HIPAA and the Leak of "Deidentified" EHR Data. Reply.《健康保险流通与责任法案》及“去标识化”电子健康记录数据的泄露。回复
N Engl J Med. 2021 Sep 16;385(12):e38. doi: 10.1056/NEJMc2111490.
6
Remove obstacles to sharing health data with researchers outside of the European Union.消除与欧盟以外的研究人员共享健康数据的障碍。
Nat Med. 2021 Aug;27(8):1329-1333. doi: 10.1038/s41591-021-01460-0.
7
Synthetic data in machine learning for medicine and healthcare.机器学习在医学和医疗保健领域中的合成数据。
Nat Biomed Eng. 2021 Jun;5(6):493-497. doi: 10.1038/s41551-021-00751-8.
8
Robo-writers: the rise and risks of language-generating AI.机器人写作:生成语言的人工智能的兴起与风险。
Nature. 2021 Mar;591(7848):22-25. doi: 10.1038/d41586-021-00530-0.
9
Evaluating Identity Disclosure Risk in Fully Synthetic Health Data: Model Development and Validation.评估完全合成健康数据中的身份披露风险:模型开发与验证
J Med Internet Res. 2020 Nov 16;22(11):e23139. doi: 10.2196/23139.
10
The future of digital health with federated learning.联合学习助力数字健康的未来。
NPJ Digit Med. 2020 Sep 14;3:119. doi: 10.1038/s41746-020-00323-1. eCollection 2020.