基于ChatGPT的生物和心理数据插补

ChatGPT-based Biological and Psychological Data Imputation.

作者信息

Nazir Anam, Cheeema Muhammad Nadeem, Wang Ze

机构信息

Department of Diagnostic Radiology and Nuclear Medicine, University of Maryland School of Medicine.

出版信息

Meta Radiol. 2023 Nov;1(3). doi: 10.1016/j.metrad.2023.100034. Epub 2023 Nov 11.

DOI:10.1016/j.metrad.2023.100034

PMID:38784385

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC11115380/

Abstract

Missing data are a common problem for large cohort or longitudinal research and have been handled through data imputation. Based on simplified models such as linear or nonlinear interpolations, current imputation methods may not be accurate for real-life data such as biological and behavioral data. The purpose of this work was to explore the capability of ChatGPT, a powerful Large Language Model (LLM) developed by OpenAI, for biological and psychological data imputation. We tested the feasibility using data from the Human Connectome Project. Performance was evaluated by comparing the imputed data against known ground truth (GT) and measured with metrics like Pearson correlation coefficient (r), relative accuracy (MP), and mean absolute error (MAE). Comparative analyses with traditional imputation techniques are also conducted to demonstrate the superior efficacy of the ChatGPT as a data imputer. In summary, through customized data-to-text prompting engineering, ChatGPT can successfully capture intricate patterns and dependencies within biological data, resulting in precise imputations. Fine-tuning ChatGPT with domain-specific biological vocabulary with human in-loop as an interpreter enhances the accuracy and relevance of the imputations.

摘要

缺失数据是大型队列研究或纵向研究中常见的问题，通常通过数据插补来处理。基于线性或非线性插值等简化模型，当前的插补方法对于生物和行为数据等现实生活数据可能并不准确。这项工作的目的是探索由OpenAI开发的强大的大语言模型ChatGPT对生物和心理数据进行插补的能力。我们使用人类连接组计划的数据测试了其可行性。通过将插补数据与已知的真实数据（GT）进行比较来评估性能，并用皮尔逊相关系数（r）、相对准确率（MP）和平均绝对误差（MAE）等指标进行衡量。还与传统插补技术进行了比较分析，以证明ChatGPT作为数据插补工具的卓越功效。总之，通过定制的数据到文本提示工程，ChatGPT能够成功捕捉生物数据中的复杂模式和依赖性，从而实现精确插补。通过以人工作为解释器，使用特定领域的生物词汇对ChatGPT进行微调，可以提高插补的准确性和相关性。

相似文献

ChatGPT-based Biological and Psychological Data Imputation.基于ChatGPT的生物和心理数据插补

Meta Radiol. 2023 Nov;1(3). doi: 10.1016/j.metrad.2023.100034. Epub 2023 Nov 11.

ChatGPT yields low accuracy in determining LI-RADS scores based on free-text and structured radiology reports in German language.ChatGPT在根据德语的自由文本和结构化放射学报告确定LI-RADS评分时准确率较低。

Front Radiol. 2024 Jul 5;4:1390774. doi: 10.3389/fradi.2024.1390774. eCollection 2024.

The performance of prognostic models depended on the choice of missing value imputation algorithm: a simulation study.预后模型的性能取决于缺失值插补算法的选择：一项模拟研究。

J Clin Epidemiol. 2024 Dec;176:111539. doi: 10.1016/j.jclinepi.2024.111539. Epub 2024 Sep 24.

Triage Performance Across Large Language Models, ChatGPT, and Untrained Doctors in Emergency Medicine: Comparative Study.分诊表现比较：大型语言模型、ChatGPT 和未经训练的急诊医生：一项对比研究。

J Med Internet Res. 2024 Jun 14;26:e53297. doi: 10.2196/53297.

Accuracy of a Commercial Large Language Model (ChatGPT) to Perform Disaster Triage of Simulated Patients Using the Simple Triage and Rapid Treatment (START) Protocol: Gage Repeatability and Reproducibility Study.商用大型语言模型（ChatGPT）运用简单分诊与快速治疗（START）协议对模拟患者进行灾难分诊的准确性：再现性和可重复性研究。

J Med Internet Res. 2024 Sep 30;26:e55648. doi: 10.2196/55648.

Leveraging Large Language Models (LLM) for the Plastic Surgery Resident Training: Do They Have a Role?利用大语言模型进行整形外科住院医师培训：它们能发挥作用吗？

Indian J Plast Surg. 2023 Aug 28;56(5):413-420. doi: 10.1055/s-0043-1772704. eCollection 2023 Oct.

Performance of Large Language Models (ChatGPT, Bing Search, and Google Bard) in Solving Case Vignettes in Physiology.大语言模型（ChatGPT、必应搜索和谷歌巴德）在解决生理学病例 vignettes 中的表现。

Cureus. 2023 Aug 4;15(8):e42972. doi: 10.7759/cureus.42972. eCollection 2023 Aug.

Multiple imputation for non-response when estimating HIV prevalence using survey data.使用调查数据估计艾滋病毒流行率时对无应答情况的多重填补法

BMC Public Health. 2015 Oct 16;15:1059. doi: 10.1186/s12889-015-2390-1.

ChatGPT and the Future of Journal Reviews: A Feasibility Study.ChatGPT 与期刊评审的未来：一项可行性研究。

Yale J Biol Med. 2023 Sep 29;96(3):415-420. doi: 10.59249/SKDH9286. eCollection 2023 Sep.

Collaborative Enhancement of Consistency and Accuracy in US Diagnosis of Thyroid Nodules Using Large Language Models.利用大语言模型提高美国甲状腺结节诊断的一致性和准确性。

Radiology. 2024 Mar;310(3):e232255. doi: 10.1148/radiol.232255.

引用本文的文献

Transformer-based arterial spin labeling perfusion MRI denoising.基于Transformer的动脉自旋标记灌注磁共振成像去噪

Vis Comput. 2025 Jul 3. doi: 10.1007/s00371-025-04061-x.

本文引用的文献

A Comprehensive Survey of ChatGPT: Advancements, Applications, Prospects, and Challenges.ChatGPT综合调查：进展、应用、前景与挑战

Meta Radiol. 2023 Sep;1(2). doi: 10.1016/j.metrad.2023.100022. Epub 2023 Oct 7.

Evaluating large language models on a highly-specialized topic, radiation oncology physics.在高度专业化的主题——放射肿瘤物理学上评估大语言模型。

Front Oncol. 2023 Jul 17;13:1219326. doi: 10.3389/fonc.2023.1219326. eCollection 2023.

Exploring the Clinical Translation of Generative Models Like ChatGPT: Promise and Pitfalls in Radiology, From Patients to Population Health.探索像ChatGPT这样的生成模型在临床中的应用：从患者到群体健康，放射学领域的前景与挑战

J Am Coll Radiol. 2023 Sep;20(9):877-885. doi: 10.1016/j.jacr.2023.07.007. Epub 2023 Jul 17.

Use of ChatGPT, GPT-4, and Bard to Improve Readability of ChatGPT's Answers to Common Questions About Lung Cancer and Lung Cancer Screening.使用ChatGPT、GPT-4和Bard来提高ChatGPT对肺癌及肺癌筛查常见问题回答的可读性。

AJR Am J Roentgenol. 2023 Nov;221(5):701-704. doi: 10.2214/AJR.23.29622. Epub 2023 Jun 21.

One Model to Synthesize Them All: Multi-Contrast Multi-Scale Transformer for Missing Data Imputation.一模型统合之：用于缺失数据插补的多对比多尺度变换模型。

IEEE Trans Med Imaging. 2023 Sep;42(9):2577-2591. doi: 10.1109/TMI.2023.3261707. Epub 2023 Aug 31.

Appropriateness of Breast Cancer Prevention and Screening Recommendations Provided by ChatGPT.ChatGPT提供的乳腺癌预防和筛查建议的适宜性。

Radiology. 2023 May;307(4):e230424. doi: 10.1148/radiol.230424. Epub 2023 Apr 4.

ECSU-Net: An Embedded Clustering Sliced U-Net Coupled With Fusing Strategy for Efficient Intervertebral Disc Segmentation and Classification.ECSU-Net：一种嵌入式聚类切片 U-Net 与融合策略相结合的高效椎间盘分割与分类方法。

IEEE Trans Image Process. 2022;31:880-893. doi: 10.1109/TIP.2021.3136619. Epub 2022 Jan 4.

Evaluating the state of the art in missing data imputation for clinical data.评估临床数据缺失值插补的最新技术状态。

Brief Bioinform. 2022 Jan 17;23(1). doi: 10.1093/bib/bbab489.

A survey on missing data in machine learning.关于机器学习中缺失数据的一项调查。

J Big Data. 2021;8(1):140. doi: 10.1186/s40537-021-00516-9. Epub 2021 Oct 27.

Living Donor-Recipient Pair Matching for Liver Transplant via Ternary Tree Representation With Cascade Incremental Learning.通过具有级联增量学习的三元树表示进行肝移植的活体供体-受体配对匹配。

IEEE Trans Biomed Eng. 2021 Aug;68(8):2540-2551. doi: 10.1109/TBME.2021.3050310. Epub 2021 Jul 16.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。