• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

使用GPT-4将胰腺癌混合语言自由文本CT报告转换为美国国立综合癌症网络结构化报告模板

Conversion of Mixed-Language Free-Text CT Reports of Pancreatic Cancer to National Comprehensive Cancer Network Structured Reporting Templates by Using GPT-4.

作者信息

Kim Hokun, Kim Bohyun, Choi Moon Hyung, Choi Joon-Il, Oh Soon Nam, Rha Sung Eun

机构信息

Department of Radiology, Seoul St. Mary's Hospital, College of Medicine, The Catholic University of Korea, Seoul, Republic of Korea.

Department of Radiology, Eunpyeong St. Mary's Hospital, College of Medicine, The Catholic University of Korea, Seoul, Republic of Korea.

出版信息

Korean J Radiol. 2025 Jun;26(6):557-568. doi: 10.3348/kjr.2024.1228. Epub 2025 Apr 17.

DOI:10.3348/kjr.2024.1228
PMID:40288895
Abstract

OBJECTIVE

To evaluate the feasibility of generative pre-trained transformer-4 (GPT-4) in generating structured reports (SRs) from mixed-language (English and Korean) narrative-style CT reports for pancreatic ductal adenocarcinoma (PDAC) and to assess its accuracy in categorizing PDCA resectability.

MATERIALS AND METHODS

This retrospective study included consecutive free-text reports of pancreas-protocol CT for staging PDAC, from two institutions, written in English or Korean from January 2021 to December 2023. Both the GPT-4 Turbo and GPT-4o models were provided prompts along with the free-text reports via an application programming interface and tasked with generating SRs and categorizing tumor resectability according to the National Comprehensive Cancer Network guidelines version 2.2024. Prompts were optimized using the GPT-4 Turbo model and 50 reports from Institution B. The performances of the GPT-4 Turbo and GPT-4o models in the two tasks were evaluated using 115 reports from Institution A. Results were compared with a reference standard that was manually derived by an abdominal radiologist. Each report was consecutively processed three times, with the most frequent response selected as the final output. Error analysis was guided by the decision rationale provided by the models.

RESULTS

Of the 115 narrative reports tested, 96 (83.5%) contained both English and Korean. For SR generation, GPT-4 Turbo and GPT-4o demonstrated comparable accuracies (92.3% [1592/1725] and 92.2% [1590/1725], respectively; = 0.923). In the resectability categorization, GPT-4 Turbo showed higher accuracy than GPT-4o (81.7% [94/115] vs. 67.0% [77/115], respectively; = 0.002). In the error analysis of GPT-4 Turbo, the SR generation error rate was 7.7% (133/1725 items), which was primarily attributed to inaccurate data extraction (54.1% [72/133]). The resectability categorization error rate was 18.3% (21/115), with the main cause being violation of the resectability criteria (61.9% [13/21]).

CONCLUSION

Both GPT-4 Turbo and GPT-4o demonstrated acceptable accuracy in generating NCCN-based SRs on PDACs from mixed-language narrative reports. However, oversight by human radiologists is essential for determining resectability based on CT findings.

摘要

目的

评估生成式预训练变换器4(GPT-4)从胰腺导管腺癌(PDAC)的混合语言(英语和韩语)叙述式CT报告生成结构化报告(SR)的可行性,并评估其在对PDAC可切除性进行分类方面的准确性。

材料与方法

这项回顾性研究纳入了2021年1月至2023年12月期间来自两个机构的连续的胰腺协议CT的自由文本报告,这些报告用于PDAC分期,用英语或韩语书写。通过应用程序编程接口向GPT-4 Turbo和GPT-4o模型提供自由文本报告并给出提示,要求它们生成SR,并根据《美国国立综合癌症网络(NCCN)指南》第2.2024版对肿瘤可切除性进行分类。使用GPT-4 Turbo模型和来自机构B的50份报告对提示进行了优化。使用来自机构A的115份报告评估GPT-4 Turbo和GPT-4o模型在这两项任务中的表现。将结果与由腹部放射科医生手动得出的参考标准进行比较。每份报告连续处理三次,选择出现频率最高的回答作为最终输出。错误分析以模型提供的决策依据为指导。

结果

在测试的115份叙述性报告中,96份(83.5%)包含英语和韩语。对于SR生成,GPT-4 Turbo和GPT-4o表现出相当的准确性(分别为92.3%[1592/1725]和92.2%[1590/1725];κ = 0.923)。在可切除性分类方面,GPT-4 Turbo显示出比GPT-4o更高的准确性(分别为81.7%[94/115]和67.0%[77/115];P = 0.002)。在GPT-4 Turbo的错误分析中,SR生成错误率为7.7%(133/1725项),这主要归因于数据提取不准确(54.1%[72/133])。可切除性分类错误率为18.3%(21/115),主要原因是违反了可切除性标准(61.9%[13/21])。

结论

GPT-4 Turbo和GPT-4o在从混合语言叙述性报告生成基于NCCN的PDAC SR方面都表现出了可接受的准确性。然而,人类放射科医生的监督对于根据CT结果确定可切除性至关重要。

相似文献

1
Conversion of Mixed-Language Free-Text CT Reports of Pancreatic Cancer to National Comprehensive Cancer Network Structured Reporting Templates by Using GPT-4.使用GPT-4将胰腺癌混合语言自由文本CT报告转换为美国国立综合癌症网络结构化报告模板
Korean J Radiol. 2025 Jun;26(6):557-568. doi: 10.3348/kjr.2024.1228. Epub 2025 Apr 17.
2
Large Language Models for Automated Synoptic Reports and Resectability Categorization in Pancreatic Cancer.大语言模型在胰腺肿瘤自动化综述报告和可切除性分类中的应用。
Radiology. 2024 Jun;311(3):e233117. doi: 10.1148/radiol.233117.
3
Lung Cancer Staging Using Chest CT and FDG PET/CT Free-Text Reports: Comparison Among Three ChatGPT Large Language Models and Six Human Readers of Varying Experience.使用胸部CT和FDG PET/CT自由文本报告进行肺癌分期:三种ChatGPT大语言模型与六位不同经验水平的人类读者的比较
AJR Am J Roentgenol. 2024 Dec;223(6):e2431696. doi: 10.2214/AJR.24.31696. Epub 2024 Sep 4.
4
Performance of GPT-4 Turbo and GPT-4o in Korean Society of Radiology In-Training Examinations.GPT-4 Turbo和GPT-4o在韩国放射学会住院医师培训考试中的表现。
Korean J Radiol. 2025 Jun;26(6):524-531. doi: 10.3348/kjr.2024.1096. Epub 2025 Apr 17.
5
Privacy-ensuring Open-weights Large Language Models Are Competitive with Closed-weights GPT-4o in Extracting Chest Radiography Findings from Free-Text Reports.在从自由文本报告中提取胸部X光检查结果方面,确保隐私的开放权重大型语言模型与封闭权重的GPT-4o具有竞争力。
Radiology. 2025 Jan;314(1):e240895. doi: 10.1148/radiol.240895.
6
Large language models for efficient whole-organ MRI score-based reports and categorization in knee osteoarthritis.用于膝关节骨关节炎中基于MRI评分的高效全器官报告和分类的大语言模型
Insights Imaging. 2025 May 14;16(1):100. doi: 10.1186/s13244-025-01976-w.
7
Multilingual feasibility of GPT-4o for automated Voice-to-Text CT and MRI report transcription.GPT-4o用于自动语音转文本CT和MRI报告转录的多语言可行性。
Eur J Radiol. 2025 Jan;182:111827. doi: 10.1016/j.ejrad.2024.111827. Epub 2024 Nov 17.
8
High-resolution pancreatic computed tomography for assessing pancreatic ductal adenocarcinoma resectability: a multicenter prospective study.高分辨率胰腺 CT 评估胰腺导管腺癌可切除性:一项多中心前瞻性研究。
Eur Radiol. 2023 Sep;33(9):5965-5975. doi: 10.1007/s00330-023-09584-2. Epub 2023 Mar 29.
9
Enhancing Radiological Reporting in Head and Neck Cancer: Converting Free-Text CT Scan Reports to Structured Reports Using Large Language Models.增强头颈癌的放射学报告:使用大语言模型将自由文本CT扫描报告转换为结构化报告
Indian J Radiol Imaging. 2024 Aug 1;35(1):43-49. doi: 10.1055/s-0044-1788589. eCollection 2025 Jan.
10
Impact of structured report on the quality of preoperative CT staging of pancreatic ductal adenocarcinoma: assessment of intra- and inter-reader variability.结构化报告对胰腺导管腺癌术前 CT 分期质量的影响:评估内读者和间读者的可变性。
Abdom Radiol (NY). 2020 Feb;45(2):437-448. doi: 10.1007/s00261-019-02287-7.

引用本文的文献

1
Large language models for clinical decision support in gastroenterology and hepatology.用于胃肠病学和肝病学临床决策支持的大语言模型
Nat Rev Gastroenterol Hepatol. 2025 Aug 22. doi: 10.1038/s41575-025-01108-1.

本文引用的文献

1
Exploring Multilingual Large Language Models for Enhanced TNM Classification of Radiology Report in Lung Cancer Staging.探索多语言大语言模型以增强肺癌分期中放射学报告的TNM分类
Cancers (Basel). 2024 Oct 26;16(21):3621. doi: 10.3390/cancers16213621.
2
How to Optimize Prompting for Large Language Models in Clinical Research.如何在临床研究中优化大语言模型的提示
Korean J Radiol. 2024 Oct;25(10):869-873. doi: 10.3348/kjr.2024.0695.
3
Minimum Reporting Items for Clear Evaluation of Accuracy Reports of Large Language Models in Healthcare (MI-CLEAR-LLM).
用于清晰评估医疗保健领域大语言模型准确性报告的最低报告项目(MI-CLEAR-LLM)。
Korean J Radiol. 2024 Oct;25(10):865-868. doi: 10.3348/kjr.2024.0843.
4
Reasoning with large language models for medical question answering.使用大语言模型进行医学问答推理。
J Am Med Inform Assoc. 2024 Sep 1;31(9):1964-1975. doi: 10.1093/jamia/ocae131.
5
Large Language Models for Automated Synoptic Reports and Resectability Categorization in Pancreatic Cancer.大语言模型在胰腺肿瘤自动化综述报告和可切除性分类中的应用。
Radiology. 2024 Jun;311(3):e233117. doi: 10.1148/radiol.233117.
6
Transcending Language Barriers: Can ChatGPT Be the Key to Enhancing Multilingual Accessibility in Health Care?跨越语言障碍:ChatGPT能否成为增强医疗保健多语言可及性的关键?
J Am Coll Radiol. 2024 Dec;21(12):1888-1895. doi: 10.1016/j.jacr.2024.05.009. Epub 2024 Jun 14.
7
How to improve ChatGPT performance for nephrologists: a technique guide.如何提升针对肾脏病专家的ChatGPT性能:技术指南
J Nephrol. 2024 Jun;37(5):1397-1403. doi: 10.1007/s40620-024-01974-z. Epub 2024 May 21.
8
Using GPT-4 for LI-RADS feature extraction and categorization with multilingual free-text reports.使用 GPT-4 对多语言自由文本报告进行 LI-RADS 特征提取和分类。
Liver Int. 2024 Jul;44(7):1578-1587. doi: 10.1111/liv.15891. Epub 2024 Apr 23.
9
Data Extraction from Free-Text Reports on Mechanical Thrombectomy in Acute Ischemic Stroke Using ChatGPT: A Retrospective Analysis.利用 ChatGPT 从急性缺血性脑卒中机械取栓的自由文本报告中提取数据:一项回顾性分析。
Radiology. 2024 Apr;311(1):e232741. doi: 10.1148/radiol.232741.
10
Learning to Make Rare and Complex Diagnoses With Generative AI Assistance: Qualitative Study of Popular Large Language Models.利用生成式人工智能辅助学习罕见且复杂的诊断:对流行的大型语言模型的定性研究。
JMIR Med Educ. 2024 Feb 13;10:e51391. doi: 10.2196/51391.