Suppr
超能文献

对使用ChatGPT从临床记录中提取结构化数据的批判性评估。

A critical assessment of using ChatGPT for extracting structured data from clinical notes.

作者信息

Huang Jingwei, Yang Donghan M, Rong Ruichen, Nezafati Kuroush, Treager Colin, Chi Zhikai, Wang Shidan, Cheng Xian, Guo Yujia, Klesse Laura J, Xiao Guanghua, Peterson Eric D, Zhan Xiaowei, Xie Yang

机构信息

Quantitative Biomedical Research Center, Peter O'Donnell School of Public Health, University of Texas Southwestern Medical Center, 5323 Harry Hines Blvd, Dallas, TX, USA 75390, USA.

Department of Pathology, University of Texas Southwestern Medical Center, 5323 Harry Hines Blvd, Dallas, TX, USA 75390, USA.

出版信息

NPJ Digit Med. 2024 May 1;7(1):106. doi: 10.1038/s41746-024-01079-8.

DOI:10.1038/s41746-024-01079-8

PMID:38693429

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC11063058/

Abstract

Existing natural language processing (NLP) methods to convert free-text clinical notes into structured data often require problem-specific annotations and model training. This study aims to evaluate ChatGPT's capacity to extract information from free-text medical notes efficiently and comprehensively. We developed a large language model (LLM)-based workflow, utilizing systems engineering methodology and spiral "prompt engineering" process, leveraging OpenAI's API for batch querying ChatGPT. We evaluated the effectiveness of this method using a dataset of more than 1000 lung cancer pathology reports and a dataset of 191 pediatric osteosarcoma pathology reports, comparing the ChatGPT-3.5 (gpt-3.5-turbo-16k) outputs with expert-curated structured data. ChatGPT-3.5 demonstrated the ability to extract pathological classifications with an overall accuracy of 89%, in lung cancer dataset, outperforming the performance of two traditional NLP methods. The performance is influenced by the design of the instructive prompt. Our case analysis shows that most misclassifications were due to the lack of highly specialized pathology terminology, and erroneous interpretation of TNM staging rules. Reproducibility shows the relatively stable performance of ChatGPT-3.5 over time. In pediatric osteosarcoma dataset, ChatGPT-3.5 accurately classified both grades and margin status with accuracy of 98.6% and 100% respectively. Our study shows the feasibility of using ChatGPT to process large volumes of clinical notes for structured information extraction without requiring extensive task-specific human annotation and model training. The results underscore the potential role of LLMs in transforming unstructured healthcare data into structured formats, thereby supporting research and aiding clinical decision-making.

摘要

现有的将自由文本临床记录转换为结构化数据的自然语言处理（NLP）方法通常需要特定问题的注释和模型训练。本研究旨在评估ChatGPT从自由文本医学记录中高效、全面提取信息的能力。我们开发了一种基于大语言模型（LLM）的工作流程，利用系统工程方法和螺旋式“提示工程”过程，借助OpenAI的应用程序编程接口（API）对ChatGPT进行批量查询。我们使用一个包含1000多份肺癌病理报告的数据集和一个包含191份儿童骨肉瘤病理报告的数据集评估了该方法的有效性，将ChatGPT-3.5（gpt-3.5-turbo-16k）的输出与专家整理的结构化数据进行比较。在肺癌数据集中，ChatGPT-3.5展示了提取病理分类的能力，总体准确率为89%，优于两种传统NLP方法的性能。性能受指导性提示设计的影响。我们的案例分析表明，大多数错误分类是由于缺乏高度专业化的病理学术语以及对TNM分期规则的错误解读。可重复性表明ChatGPT-3.5随时间推移性能相对稳定。在儿童骨肉瘤数据集中，ChatGPT-3.5对分级和切缘状态的准确分类率分别为98.6%和100%。我们的研究表明，使用ChatGPT处理大量临床记录以提取结构化信息是可行的，无需大量特定任务的人工注释和模型训练。结果强调了大语言模型在将非结构化医疗数据转换为结构化格式方面的潜在作用，从而支持研究并辅助临床决策。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5c24/11063058/09397d10547c/41746_2024_1079_Fig1_HTML.jpg

相似文献

A critical assessment of using ChatGPT for extracting structured data from clinical notes.

NPJ Digit Med. 2024 May 1;7(1):106. doi: 10.1038/s41746-024-01079-8.

Optimizing ChatGPT's Interpretation and Reporting of Delirium Assessment Outcomes: Exploratory Study.

JMIR Form Res. 2024 Oct 1;8:e51383. doi: 10.2196/51383.

Evaluating the Influence of Role-Playing Prompts on ChatGPT's Misinformation Detection Accuracy: Quantitative Study.

JMIR Infodemiology. 2024 Sep 26;4:e60678. doi: 10.2196/60678.

Applications of the Natural Language Processing Tool ChatGPT in Clinical Practice: Comparative Study and Augmented Systematic Review.

JMIR Med Inform. 2023 Nov 28;11:e48933. doi: 10.2196/48933.

ChatGPT yields low accuracy in determining LI-RADS scores based on free-text and structured radiology reports in German language.

Front Radiol. 2024 Jul 5;4:1390774. doi: 10.3389/fradi.2024.1390774. eCollection 2024.

An investigation study on the interpretation of ultrasonic medical reports using OpenAI's GPT-3.5-turbo model.

J Clin Ultrasound. 2024 Feb;52(2):105-111. doi: 10.1002/jcu.23590. Epub 2023 Nov 6.

Potential of ChatGPT and GPT-4 for Data Mining of Free-Text CT Reports on Lung Cancer.

Radiology. 2023 Sep;308(3):e231362. doi: 10.1148/radiol.231362.

How Does ChatGPT Perform on the United States Medical Licensing Examination (USMLE)? The Implications of Large Language Models for Medical Education and Knowledge Assessment.

JMIR Med Educ. 2023 Feb 8;9:e45312. doi: 10.2196/45312.

Evaluating Large Language Models in Extracting Cognitive Exam Dates and Scores.

medRxiv. 2024 Feb 13:2023.07.10.23292373. doi: 10.1101/2023.07.10.23292373.

Collaborative Enhancement of Consistency and Accuracy in US Diagnosis of Thyroid Nodules Using Large Language Models.

Radiology. 2024 Mar;310(3):e232255. doi: 10.1148/radiol.232255.

引用本文的文献

A natural language processing pipeline for identifying pediatric long COVID symptoms and functional impacts in freeform clinical notes: a RECOVER study.

JAMIA Open. 2025 Sep 4;8(5):ooaf089. doi: 10.1093/jamiaopen/ooaf089. eCollection 2025 Oct.

Extracting Clinical Guideline Information Using Two Large Language Models: Evaluation Study.

J Med Internet Res. 2025 Sep 5;27:e73486. doi: 10.2196/73486.

AI-Driven Tacrolimus Dosing in Transplant Care: Cohort Study.

JMIR AI. 2025 Sep 2;4:e67302. doi: 10.2196/67302.

Performance and improvement strategies for adapting generative large language models for electronic health record applications: A systematic review.

Int J Med Inform. 2025 Aug 28;205:106091. doi: 10.1016/j.ijmedinf.2025.106091.

Applications of generative artificial intelligence in outcome prediction in intensive care medicine-a scoping review.

Front Digit Health. 2025 Aug 5;7:1633458. doi: 10.3389/fdgth.2025.1633458. eCollection 2025.

Incorporating large language models as clinical decision support in oncology: the Woollie model.

NPJ Digit Med. 2025 Aug 18;8(1):529. doi: 10.1038/s41746-025-01941-3.

EchoLLM: extracting echocardiogram entities with light-weight, open-source large language models.

JAMIA Open. 2025 Aug 13;8(4):ooaf092. doi: 10.1093/jamiaopen/ooaf092. eCollection 2025 Aug.

Using Large Languge Models for Processing Sensor Data.

Sensors (Basel). 2025 Jul 13;25(14):4380. doi: 10.3390/s25144380.

Using Open-Source Large Language Models to Identify Access to Germline Genetic Testing in Veterans With Breast Cancer From Unstructured Text.

JCO Clin Cancer Inform. 2025 Jul;9:e2400263. doi: 10.1200/CCI-24-00263. Epub 2025 Jul 22.

A survey of NLP methods for oncology in the past decade with a focus on cancer registry applications.

Artif Intell Rev. 2025;58(10):314. doi: 10.1007/s10462-025-11316-5. Epub 2025 Jul 16.

本文引用的文献

Osteosarcoma Explorer: A Data Commons With Clinical, Genomic, Protein, and Tissue Imaging Data for Osteosarcoma Research.

JCO Clin Cancer Inform. 2023 Sep;7:e2300104. doi: 10.1200/CCI.23.00104.

Will ChatGPT transform healthcare?

Nat Med. 2023 Mar;29(3):505-506. doi: 10.1038/s41591-023-02289-5.

Using ChatGPT to write patient clinic letters.

Lancet Digit Health. 2023 Apr;5(4):e179-e181. doi: 10.1016/S2589-7500(23)00048-1. Epub 2023 Mar 7.

ChatGPT and antimicrobial advice: the end of the consulting infection doctor?

Lancet Infect Dis. 2023 Apr;23(4):405-406. doi: 10.1016/S1473-3099(23)00113-5. Epub 2023 Feb 20.

Artificial Hallucinations in ChatGPT: Implications in Scientific Writing.

Cureus. 2023 Feb 19;15(2):e35179. doi: 10.7759/cureus.35179. eCollection 2023 Feb.

ChatGPT: the future of discharge summaries?

Lancet Digit Health. 2023 Mar;5(3):e107-e108. doi: 10.1016/S2589-7500(23)00021-3. Epub 2023 Feb 6.

ChatGPT: friend or foe?

Lancet Digit Health. 2023 Mar;5(3):e102. doi: 10.1016/S2589-7500(23)00023-7. Epub 2023 Feb 6.

A survey on clinical natural language processing in the United Kingdom from 2007 to 2022.

NPJ Digit Med. 2022 Dec 21;5(1):186. doi: 10.1038/s41746-022-00730-6.

Health Natural Language Processing: Methodology Development and Applications.

JMIR Med Inform. 2021 Oct 21;9(10):e23898. doi: 10.2196/23898.

Med-BERT: pretrained contextualized embeddings on large-scale structured electronic health records for disease prediction.

NPJ Digit Med. 2021 May 20;4(1):86. doi: 10.1038/s41746-021-00455-y.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

Suppr超能文献

对使用ChatGPT从临床记录中提取结构化数据的批判性评估。

A critical assessment of using ChatGPT for extracting structured data from clinical notes.

作者信息

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献

文献AI研究员

用中文搜PubMed

文档翻译