• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

大语言模型在挖掘电子健康记录数据中的变革潜力:内容分析

The Transformative Potential of Large Language Models in Mining Electronic Health Records Data: Content Analysis.

作者信息

Wals Zurita Amadeo Jesus, Miras Del Rio Hector, Ugarte Ruiz de Aguirre Nerea, Nebrera Navarro Cristina, Rubio Jimenez Maria, Muñoz Carmona David, Miguez Sanchez Carlos

机构信息

Servicio Oncologia Radioterápica, Hospital Universitario Virgen Macarena, Andalusian Health Service, Seville, Spain.

出版信息

JMIR Med Inform. 2025 Jan 2;13:e58457. doi: 10.2196/58457.

DOI:10.2196/58457
PMID:39746191
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC11739723/
Abstract

BACKGROUND

In this study, we evaluate the accuracy, efficiency, and cost-effectiveness of large language models in extracting and structuring information from free-text clinical reports, particularly in identifying and classifying patient comorbidities within oncology electronic health records. We specifically compare the performance of gpt-3.5-turbo-1106 and gpt-4-1106-preview models against that of specialized human evaluators.

OBJECTIVE

We specifically compare the performance of gpt-3.5-turbo-1106 and gpt-4-1106-preview models against that of specialized human evaluators.

METHODS

We implemented a script using the OpenAI application programming interface to extract structured information in JavaScript object notation format from comorbidities reported in 250 personal history reports. These reports were manually reviewed in batches of 50 by 5 specialists in radiation oncology. We compared the results using metrics such as sensitivity, specificity, precision, accuracy, F-value, κ index, and the McNemar test, in addition to examining the common causes of errors in both humans and generative pretrained transformer (GPT) models.

RESULTS

The GPT-3.5 model exhibited slightly lower performance compared to physicians across all metrics, though the differences were not statistically significant (McNemar test, P=.79). GPT-4 demonstrated clear superiority in several key metrics (McNemar test, P<.001). Notably, it achieved a sensitivity of 96.8%, compared to 88.2% for GPT-3.5 and 88.8% for physicians. However, physicians marginally outperformed GPT-4 in precision (97.7% vs 96.8%). GPT-4 showed greater consistency, replicating the exact same results in 76% of the reports across 10 repeated analyses, compared to 59% for GPT-3.5, indicating more stable and reliable performance. Physicians were more likely to miss explicit comorbidities, while the GPT models more frequently inferred nonexplicit comorbidities, sometimes correctly, though this also resulted in more false positives.

CONCLUSIONS

This study demonstrates that, with well-designed prompts, the large language models examined can match or even surpass medical specialists in extracting information from complex clinical reports. Their superior efficiency in time and costs, along with easy integration with databases, makes them a valuable tool for large-scale data mining and real-world evidence generation.

摘要

背景

在本研究中,我们评估了大语言模型从自由文本临床报告中提取和构建信息的准确性、效率和成本效益,特别是在肿瘤电子健康记录中识别和分类患者合并症方面。我们特别比较了gpt - 3.5 - turbo - 1106和gpt - 4 - 1106 - preview模型与专业人类评估者的性能。

目的

我们特别比较了gpt - 3.5 - turbo - 1106和gpt - 4 - 1106 - preview模型与专业人类评估者的性能。

方法

我们使用OpenAI应用程序编程接口实现了一个脚本,以从250份个人病史报告中报告的合并症中提取JavaScript对象表示法格式的结构化信息。这些报告由5名放射肿瘤学专家分批进行人工审核,每次审核50份。除了检查人类和生成式预训练变换器(GPT)模型中错误的常见原因外,我们还使用敏感性、特异性、精确度、准确性、F值、κ指数和McNemar检验等指标比较了结果。

结果

在所有指标上,GPT - 3.5模型的表现略低于医生,不过差异无统计学意义(McNemar检验,P = 0.79)。GPT - 4在几个关键指标上表现出明显优势(McNemar检验,P < 0.001)。值得注意的是,它的敏感性达到了96.8%,而GPT - 3.5为88.2%,医生为88.8%。然而,医生在精确度上略优于GPT - 4(97.7%对96.8%)。GPT - 4表现出更高的一致性,在10次重复分析中,76%的报告得到了完全相同的结果,而GPT - 3.5为59%,这表明其性能更稳定、更可靠。医生更容易遗漏明确的合并症,而GPT模型更频繁地推断出不明确的合并症,有时推断正确,但这也导致了更多的假阳性。

结论

本研究表明,通过精心设计提示,所研究的大语言模型在从复杂临床报告中提取信息方面可以与医学专家相匹配甚至超越他们。它们在时间和成本方面的卓越效率,以及与数据库的轻松集成,使其成为大规模数据挖掘和生成真实世界证据的宝贵工具。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e3a7/11739723/4bbafc01f7e5/medinform_v13i1e58457_fig4.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e3a7/11739723/6cfee737949c/medinform_v13i1e58457_fig1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e3a7/11739723/5e2651e9c600/medinform_v13i1e58457_fig2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e3a7/11739723/cd17296fbd02/medinform_v13i1e58457_fig3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e3a7/11739723/4bbafc01f7e5/medinform_v13i1e58457_fig4.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e3a7/11739723/6cfee737949c/medinform_v13i1e58457_fig1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e3a7/11739723/5e2651e9c600/medinform_v13i1e58457_fig2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e3a7/11739723/cd17296fbd02/medinform_v13i1e58457_fig3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e3a7/11739723/4bbafc01f7e5/medinform_v13i1e58457_fig4.jpg

相似文献

1
The Transformative Potential of Large Language Models in Mining Electronic Health Records Data: Content Analysis.大语言模型在挖掘电子健康记录数据中的变革潜力:内容分析
JMIR Med Inform. 2025 Jan 2;13:e58457. doi: 10.2196/58457.
2
Large language models can accurately populate Vascular Quality Initiative procedural databases using narrative operative reports.大型语言模型可以使用手术记录准确填充血管质量倡议程序数据库。
J Vasc Surg. 2025 Apr;81(4):973-982. doi: 10.1016/j.jvs.2024.12.002. Epub 2024 Dec 16.
3
Potential of ChatGPT and GPT-4 for Data Mining of Free-Text CT Reports on Lung Cancer.ChatGPT 和 GPT-4 在挖掘肺癌 CT 报告自由文本数据方面的潜力
Radiology. 2023 Sep;308(3):e231362. doi: 10.1148/radiol.231362.
4
Evaluation of Large Language Models in Tailoring Educational Content for Cancer Survivors and Their Caregivers: Quality Analysis.大型语言模型在为癌症幸存者及其护理人员量身定制教育内容方面的评估:质量分析
JMIR Cancer. 2025 Apr 7;11:e67914. doi: 10.2196/67914.
5
Using Synthetic Health Care Data to Leverage Large Language Models for Named Entity Recognition: Development and Validation Study.利用合成医疗保健数据借助大语言模型进行命名实体识别:开发与验证研究。
J Med Internet Res. 2025 Mar 18;27:e66279. doi: 10.2196/66279.
6
A large language model-based generative natural language processing framework fine-tuned on clinical notes accurately extracts headache frequency from electronic health records.基于大型语言模型的生成式自然语言处理框架,在临床笔记上进行了微调,能够从电子健康记录中准确提取头痛频率。
Headache. 2024 Apr;64(4):400-409. doi: 10.1111/head.14702. Epub 2024 Mar 25.
7
Lung Cancer Staging Using Chest CT and FDG PET/CT Free-Text Reports: Comparison Among Three ChatGPT Large Language Models and Six Human Readers of Varying Experience.使用胸部CT和FDG PET/CT自由文本报告进行肺癌分期:三种ChatGPT大语言模型与六位不同经验水平的人类读者的比较
AJR Am J Roentgenol. 2024 Dec;223(6):e2431696. doi: 10.2214/AJR.24.31696. Epub 2024 Sep 4.
8
Improving large language models for clinical named entity recognition via prompt engineering.通过提示工程改进临床命名实体识别的大型语言模型。
J Am Med Inform Assoc. 2024 Sep 1;31(9):1812-1820. doi: 10.1093/jamia/ocad259.
9
Information Extraction from Clinical Texts with Generative Pre-trained Transformer Models.利用生成式预训练Transformer模型从临床文本中提取信息。
Int J Med Sci. 2025 Feb 3;22(5):1015-1028. doi: 10.7150/ijms.103332. eCollection 2025.
10
Large language models for data extraction from unstructured and semi-structured electronic health records: a multiple model performance evaluation.用于从非结构化和半结构化电子健康记录中提取数据的大语言模型:多模型性能评估
BMJ Health Care Inform. 2025 Jan 19;32(1):e101139. doi: 10.1136/bmjhci-2024-101139.

引用本文的文献

1
Extracting Pulmonary Embolism Diagnoses From Radiology Impressions Using GPT-4o: Large Language Model Evaluation Study.使用GPT-4o从放射学诊断印象中提取肺栓塞诊断:大语言模型评估研究
JMIR Med Inform. 2025 Apr 9;13:e67706. doi: 10.2196/67706.
2
Identifying Patient-Reported Outcome Measure Documentation in Veterans Health Administration Chiropractic Clinic Notes: Natural Language Processing Analysis.识别退伍军人健康管理局脊椎按摩诊所记录中的患者报告结局测量文档:自然语言处理分析
JMIR Med Inform. 2025 Apr 2;13:e66466. doi: 10.2196/66466.
3
Classification performance and reproducibility of GPT-4 omni for information extraction from veterinary electronic health records.

本文引用的文献

1
Prompt Engineering Paradigms for Medical Applications: Scoping Review.医学应用的提示工程范式:范围综述。
J Med Internet Res. 2024 Sep 10;26:e60501. doi: 10.2196/60501.
2
ChatGPT With GPT-4 Outperforms Emergency Department Physicians in Diagnostic Accuracy: Retrospective Analysis.ChatGPT 联合 GPT-4 在诊断准确率上优于急诊科医生:回顾性分析。
J Med Internet Res. 2024 Jul 8;26:e56110. doi: 10.2196/56110.
3
Using large language models for safety-related table summarization in clinical study reports.在临床研究报告中使用大语言模型进行与安全性相关的表格总结。
用于从兽医电子健康记录中提取信息的GPT-4全知模型的分类性能和可重复性
Front Vet Sci. 2025 Jan 16;11:1490030. doi: 10.3389/fvets.2024.1490030. eCollection 2024.
JAMIA Open. 2024 May 29;7(2):ooae043. doi: 10.1093/jamiaopen/ooae043. eCollection 2024 Jul.
4
Optimization of hepatological clinical guidelines interpretation by large language models: a retrieval augmented generation-based framework.基于检索增强生成框架的大语言模型对肝病临床指南解读的优化
NPJ Digit Med. 2024 Apr 23;7(1):102. doi: 10.1038/s41746-024-01091-y.
5
Developing prompts from large language model for extracting clinical information from pathology and ultrasound reports in breast cancer.利用大语言模型开发提示,以从乳腺癌的病理学和超声报告中提取临床信息。
Radiat Oncol J. 2023 Sep;41(3):209-216. doi: 10.3857/roj.2023.00633. Epub 2023 Sep 21.
6
Approach to machine learning for extraction of real-world data variables from electronic health records.从电子健康记录中提取真实世界数据变量的机器学习方法。
Front Pharmacol. 2023 Sep 15;14:1180962. doi: 10.3389/fphar.2023.1180962. eCollection 2023.
7
Potential of ChatGPT and GPT-4 for Data Mining of Free-Text CT Reports on Lung Cancer.ChatGPT 和 GPT-4 在挖掘肺癌 CT 报告自由文本数据方面的潜力
Radiology. 2023 Sep;308(3):e231362. doi: 10.1148/radiol.231362.
8
Utility of ChatGPT in Clinical Practice.ChatGPT 在临床实践中的应用。
J Med Internet Res. 2023 Jun 28;25:e48568. doi: 10.2196/48568.
9
Real-world data: a brief review of the methods, applications, challenges and opportunities.真实世界数据:方法、应用、挑战和机遇的简要回顾。
BMC Med Res Methodol. 2022 Nov 5;22(1):287. doi: 10.1186/s12874-022-01768-6.
10
From real-world electronic health record data to real-world results using artificial intelligence.从真实世界的电子健康记录数据到使用人工智能获得真实世界的结果。
Ann Rheum Dis. 2023 Mar;82(3):306-311. doi: 10.1136/ard-2022-222626. Epub 2022 Sep 23.