Barr Austin A, Quan Joshua, Guo Eddie, Sezgin Emre
Cumming School of Medicine, University of Calgary, Calgary, AB, Canada.
The Abigail Wexner Research Institute, Nationwide Children's Hospital, Columbus, OH, United States.
Front Artif Intell. 2025 Feb 5;8:1533508. doi: 10.3389/frai.2025.1533508. eCollection 2025.
Clinical data is instrumental to medical research, machine learning (ML) model development, and advancing surgical care, but access is often constrained by privacy regulations and missing data. Synthetic data offers a promising solution to preserve privacy while enabling broader data access. Recent advances in large language models (LLMs) provide an opportunity to generate synthetic data with reduced reliance on domain expertise, computational resources, and pre-training.
This study aims to assess the feasibility of generating realistic tabular clinical data with OpenAI's GPT-4o using zero-shot prompting, and evaluate the fidelity of LLM-generated data by comparing its statistical properties to the Vital Signs DataBase (VitalDB), a real-world open-source perioperative dataset.
In Phase 1, GPT-4o was prompted to generate a dataset with qualitative descriptions of 13 clinical parameters. The resultant data was assessed for general errors, plausibility of outputs, and cross-verification of related parameters. In Phase 2, GPT-4o was prompted to generate a dataset using descriptive statistics of the VitalDB dataset. Fidelity was assessed using two-sample -tests, two-sample proportion tests, and 95% confidence interval (CI) overlap.
In Phase 1, GPT-4o generated a complete and structured dataset comprising 6,166 case files. The dataset was plausible in range and correctly calculated body mass index for all case files based on respective heights and weights. Statistical comparison between the LLM-generated datasets and VitalDB revealed that Phase 2 data achieved significant fidelity. Phase 2 data demonstrated statistical similarity in 12/13 (92.31%) parameters, whereby no statistically significant differences were observed in 6/6 (100.0%) categorical/binary and 6/7 (85.71%) continuous parameters. Overlap of 95% CIs were observed in 6/7 (85.71%) continuous parameters.
Zero-shot prompting with GPT-4o can generate realistic tabular synthetic datasets, which can replicate key statistical properties of real-world perioperative data. This study highlights the potential of LLMs as a novel and accessible modality for synthetic data generation, which may address critical barriers in clinical data access and eliminate the need for technical expertise, extensive computational resources, and pre-training. Further research is warranted to enhance fidelity and investigate the use of LLMs to amplify and augment datasets, preserve multivariate relationships, and train robust ML models.
临床数据对医学研究、机器学习(ML)模型开发以及推进外科护理至关重要,但数据获取常常受到隐私法规和数据缺失的限制。合成数据为保护隐私同时实现更广泛的数据访问提供了一个有前景的解决方案。大语言模型(LLMs)的最新进展提供了一个机会,可减少对领域专业知识、计算资源和预训练的依赖来生成合成数据。
本研究旨在评估使用零样本提示通过OpenAI的GPT - 4o生成逼真的表格临床数据的可行性,并通过将大语言模型生成的数据的统计特性与真实世界的开源围手术期数据集生命体征数据库(VitalDB)进行比较,来评估大语言模型生成数据的保真度。
在第一阶段,提示GPT - 4o生成一个包含13个临床参数定性描述的数据集。对所得数据进行一般错误评估、输出的合理性评估以及相关参数的交叉验证。在第二阶段,提示GPT - 4o使用VitalDB数据集的描述性统计生成一个数据集。使用双样本t检验、双样本比例检验和95%置信区间(CI)重叠来评估保真度。
在第一阶段,GPT - 4o生成了一个包含6166个病例文件的完整且结构化的数据集。该数据集在范围上是合理的,并且根据各自的身高和体重为所有病例文件正确计算了体重指数。大语言模型生成的数据集与VitalDB之间的统计比较表明,第二阶段的数据具有显著的保真度。第二阶段的数据在13个参数中的12个(92.31%)上显示出统计相似性,其中在6个分类/二元参数中的6个(100.0%)以及7个连续参数中的6个(85.71%)未观察到统计学上的显著差异。在7个连续参数中的6个(85.71%)观察到95%置信区间的重叠。
使用GPT - 4o进行零样本提示可以生成逼真的表格合成数据集,该数据集可以复制真实世界围手术期数据的关键统计特性。本研究强调了大语言模型作为一种新颖且易于使用的合成数据生成方式的潜力,这可能解决临床数据访问中的关键障碍,并消除对技术专业知识、大量计算资源和预训练的需求。有必要进行进一步的研究以提高保真度,并研究使用大语言模型来扩充和增强数据集、保留多变量关系以及训练强大的机器学习模型。