• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

大型语言模型生成合成临床数据集:与真实世界围手术期数据的可行性及对比分析

Large language models generating synthetic clinical datasets: a feasibility and comparative analysis with real-world perioperative data.

作者信息

Barr Austin A, Quan Joshua, Guo Eddie, Sezgin Emre

机构信息

Cumming School of Medicine, University of Calgary, Calgary, AB, Canada.

The Abigail Wexner Research Institute, Nationwide Children's Hospital, Columbus, OH, United States.

出版信息

Front Artif Intell. 2025 Feb 5;8:1533508. doi: 10.3389/frai.2025.1533508. eCollection 2025.

DOI:10.3389/frai.2025.1533508
PMID:39974356
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC11836953/
Abstract

BACKGROUND

Clinical data is instrumental to medical research, machine learning (ML) model development, and advancing surgical care, but access is often constrained by privacy regulations and missing data. Synthetic data offers a promising solution to preserve privacy while enabling broader data access. Recent advances in large language models (LLMs) provide an opportunity to generate synthetic data with reduced reliance on domain expertise, computational resources, and pre-training.

OBJECTIVE

This study aims to assess the feasibility of generating realistic tabular clinical data with OpenAI's GPT-4o using zero-shot prompting, and evaluate the fidelity of LLM-generated data by comparing its statistical properties to the Vital Signs DataBase (VitalDB), a real-world open-source perioperative dataset.

METHODS

In Phase 1, GPT-4o was prompted to generate a dataset with qualitative descriptions of 13 clinical parameters. The resultant data was assessed for general errors, plausibility of outputs, and cross-verification of related parameters. In Phase 2, GPT-4o was prompted to generate a dataset using descriptive statistics of the VitalDB dataset. Fidelity was assessed using two-sample -tests, two-sample proportion tests, and 95% confidence interval (CI) overlap.

RESULTS

In Phase 1, GPT-4o generated a complete and structured dataset comprising 6,166 case files. The dataset was plausible in range and correctly calculated body mass index for all case files based on respective heights and weights. Statistical comparison between the LLM-generated datasets and VitalDB revealed that Phase 2 data achieved significant fidelity. Phase 2 data demonstrated statistical similarity in 12/13 (92.31%) parameters, whereby no statistically significant differences were observed in 6/6 (100.0%) categorical/binary and 6/7 (85.71%) continuous parameters. Overlap of 95% CIs were observed in 6/7 (85.71%) continuous parameters.

CONCLUSION

Zero-shot prompting with GPT-4o can generate realistic tabular synthetic datasets, which can replicate key statistical properties of real-world perioperative data. This study highlights the potential of LLMs as a novel and accessible modality for synthetic data generation, which may address critical barriers in clinical data access and eliminate the need for technical expertise, extensive computational resources, and pre-training. Further research is warranted to enhance fidelity and investigate the use of LLMs to amplify and augment datasets, preserve multivariate relationships, and train robust ML models.

摘要

背景

临床数据对医学研究、机器学习(ML)模型开发以及推进外科护理至关重要,但数据获取常常受到隐私法规和数据缺失的限制。合成数据为保护隐私同时实现更广泛的数据访问提供了一个有前景的解决方案。大语言模型(LLMs)的最新进展提供了一个机会,可减少对领域专业知识、计算资源和预训练的依赖来生成合成数据。

目的

本研究旨在评估使用零样本提示通过OpenAI的GPT - 4o生成逼真的表格临床数据的可行性,并通过将大语言模型生成的数据的统计特性与真实世界的开源围手术期数据集生命体征数据库(VitalDB)进行比较,来评估大语言模型生成数据的保真度。

方法

在第一阶段,提示GPT - 4o生成一个包含13个临床参数定性描述的数据集。对所得数据进行一般错误评估、输出的合理性评估以及相关参数的交叉验证。在第二阶段,提示GPT - 4o使用VitalDB数据集的描述性统计生成一个数据集。使用双样本t检验、双样本比例检验和95%置信区间(CI)重叠来评估保真度。

结果

在第一阶段,GPT - 4o生成了一个包含6166个病例文件的完整且结构化的数据集。该数据集在范围上是合理的,并且根据各自的身高和体重为所有病例文件正确计算了体重指数。大语言模型生成的数据集与VitalDB之间的统计比较表明,第二阶段的数据具有显著的保真度。第二阶段的数据在13个参数中的12个(92.31%)上显示出统计相似性,其中在6个分类/二元参数中的6个(100.0%)以及7个连续参数中的6个(85.71%)未观察到统计学上的显著差异。在7个连续参数中的6个(85.71%)观察到95%置信区间的重叠。

结论

使用GPT - 4o进行零样本提示可以生成逼真的表格合成数据集,该数据集可以复制真实世界围手术期数据的关键统计特性。本研究强调了大语言模型作为一种新颖且易于使用的合成数据生成方式的潜力,这可能解决临床数据访问中的关键障碍,并消除对技术专业知识、大量计算资源和预训练的需求。有必要进行进一步的研究以提高保真度,并研究使用大语言模型来扩充和增强数据集、保留多变量关系以及训练强大的机器学习模型。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/dc3f/11836953/676564976e2c/frai-08-1533508-g004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/dc3f/11836953/ac858096de19/frai-08-1533508-g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/dc3f/11836953/12f866fe3950/frai-08-1533508-g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/dc3f/11836953/7c65dacee999/frai-08-1533508-g003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/dc3f/11836953/676564976e2c/frai-08-1533508-g004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/dc3f/11836953/ac858096de19/frai-08-1533508-g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/dc3f/11836953/12f866fe3950/frai-08-1533508-g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/dc3f/11836953/7c65dacee999/frai-08-1533508-g003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/dc3f/11836953/676564976e2c/frai-08-1533508-g004.jpg

相似文献

1
Large language models generating synthetic clinical datasets: a feasibility and comparative analysis with real-world perioperative data.大型语言模型生成合成临床数据集:与真实世界围手术期数据的可行性及对比分析
Front Artif Intell. 2025 Feb 5;8:1533508. doi: 10.3389/frai.2025.1533508. eCollection 2025.
2
Privacy-ensuring Open-weights Large Language Models Are Competitive with Closed-weights GPT-4o in Extracting Chest Radiography Findings from Free-Text Reports.在从自由文本报告中提取胸部X光检查结果方面,确保隐私的开放权重大型语言模型与封闭权重的GPT-4o具有竞争力。
Radiology. 2025 Jan;314(1):e240895. doi: 10.1148/radiol.240895.
3
AI in Home Care-Evaluation of Large Language Models for Future Training of Informal Caregivers: Observational Comparative Case Study.家庭护理中的人工智能——对用于未来非正式护理人员培训的大语言模型的评估:观察性比较案例研究
J Med Internet Res. 2025 Apr 28;27:e70703. doi: 10.2196/70703.
4
Utility-based Analysis of Statistical Approaches and Deep Learning Models for Synthetic Data Generation With Focus on Correlation Structures: Algorithm Development and Validation.基于效用的统计方法和深度学习模型用于合成数据生成的分析,重点关注相关结构:算法开发与验证
JMIR AI. 2025 Mar 20;4:e65729. doi: 10.2196/65729.
5
Comparing Commercial and Open-Source Large Language Models for Labeling Chest Radiograph Reports.比较商用和开源大语言模型在标注胸部 X 光报告中的表现。
Radiology. 2024 Oct;313(1):e241139. doi: 10.1148/radiol.241139.
6
Leveraging Large Language Models for Precision Monitoring of Chemotherapy-Induced Toxicities: A Pilot Study with Expert Comparisons and Future Directions.利用大语言模型进行化疗诱导毒性的精准监测:一项专家比较及未来方向的试点研究
Cancers (Basel). 2024 Aug 12;16(16):2830. doi: 10.3390/cancers16162830.
7
Using Synthetic Health Care Data to Leverage Large Language Models for Named Entity Recognition: Development and Validation Study.利用合成医疗保健数据借助大语言模型进行命名实体识别:开发与验证研究。
J Med Internet Res. 2025 Mar 18;27:e66279. doi: 10.2196/66279.
8
Zero-shot learning for clinical phenotyping: Comparing LLMs and rule-based methods.用于临床表型分析的零样本学习:比较大语言模型和基于规则的方法。
Comput Biol Med. 2025 Jun;192(Pt A):110181. doi: 10.1016/j.compbiomed.2025.110181. Epub 2025 Apr 23.
9
Assessing the Application of Large Language Models in Generating Dermatologic Patient Education Materials According to Reading Level: Qualitative Study.评估大语言模型在根据阅读水平生成皮肤科患者教育材料方面的应用:定性研究。
JMIR Dermatol. 2024 May 16;7:e55898. doi: 10.2196/55898.
10
An Evaluation of the Performance of OpenAI-o1 and GPT-4o in the Japanese National Examination for Physical Therapists.OpenAI-o1和GPT-4o在日本物理治疗师国家考试中的表现评估
Cureus. 2025 Jan 6;17(1):e76989. doi: 10.7759/cureus.76989. eCollection 2025 Jan.

本文引用的文献

1
Can I trust my fake data - A comprehensive quality assessment framework for synthetic tabular data in healthcare.我能相信我的虚假数据吗——医疗保健中合成表格数据的综合质量评估框架。
Int J Med Inform. 2024 May;185:105413. doi: 10.1016/j.ijmedinf.2024.105413. Epub 2024 Mar 12.
2
Comparison of the Performance of GPT-3.5 and GPT-4 With That of Medical Students on the Written German Medical Licensing Examination: Observational Study.GPT-3.5 和 GPT-4 与医学生在书面德语文凭考试中的表现比较:观察性研究。
JMIR Med Educ. 2024 Feb 8;10:e50965. doi: 10.2196/50965.
3
Massive generation of synthetic medical records with ChatGPT: An example in hip fractures.
利用ChatGPT大量生成合成医学记录:以髋部骨折为例。
Med Clin (Barc). 2024 Jun 14;162(11):549-554. doi: 10.1016/j.medcli.2023.11.027. Epub 2024 Jan 29.
4
Evaluation of the performance of GPT-3.5 and GPT-4 on the Polish Medical Final Examination.评估 GPT-3.5 和 GPT-4 在波兰医学期末考试中的表现。
Sci Rep. 2023 Nov 22;13(1):20512. doi: 10.1038/s41598-023-46995-z.
5
A Comprehensive Survey of ChatGPT: Advancements, Applications, Prospects, and Challenges.ChatGPT综合调查:进展、应用、前景与挑战
Meta Radiol. 2023 Sep;1(2). doi: 10.1016/j.metrad.2023.100022. Epub 2023 Oct 7.
6
Opportunities and Challenges of Synthetic Data Generation in Oncology.肿瘤学中合成数据生成的机遇与挑战。
JCO Clin Cancer Inform. 2023 Aug;7:e2300045. doi: 10.1200/CCI.23.00045.
7
Status of Synthetic Data Generation for Structured Health Data.结构化健康数据的合成数据生成现状。
JCO Clin Cancer Inform. 2023 Jun;7:e2300071. doi: 10.1200/CCI.23.00071.
8
A Multifaceted benchmarking of synthetic electronic health record generation models.综合电子健康记录生成模型的多方面基准测试。
Nat Commun. 2022 Dec 9;13(1):7609. doi: 10.1038/s41467-022-35295-1.
9
Synthetic data as an enabler for machine learning applications in medicine.合成数据助力医学领域的机器学习应用。
iScience. 2022 Oct 13;25(11):105331. doi: 10.1016/j.isci.2022.105331. eCollection 2022 Nov 18.
10
VitalDB, a high-fidelity multi-parameter vital signs database in surgical patients.VitalDB,一个高保真多参数手术患者生命体征数据库。
Sci Data. 2022 Jun 8;9(1):279. doi: 10.1038/s41597-022-01411-5.