• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

通过大语言模型辅助分析加强肿瘤监测:GPT-4与Gemini在评估腹部系列CT扫描报告中的肿瘤问题方面的比较研究

Enhancing Oncological Surveillance Through Large Language Model-Assisted Analysis: A Comparative Study of GPT-4 and Gemini in Evaluating Oncological Issues From Serial Abdominal CT Scan Reports.

作者信息

Han Na Yeon, Shin Keewon, Kim Min Ju, Park Beom Jin, Sim Ki Choon, Han Yeo Eun, Sung Deuk Jae, Choi Jae Woong, Yeom Suk Keu

机构信息

Department of Radiology, Korea University Anam Hospital, Korea University College of Medicine, 73 Goryeodae-ro, Seongbuk-gu, Seoul, Republic of Korea (N.Y.H., M.J.K., B.J.P., K.C.S., Y.E.H., D.J.S.).

Center for AI and Digital Healthcare Research, Korea University Anam Hospital, Korea University College of Medicine, Seoul, Republic of Korea (K.S.).

出版信息

Acad Radiol. 2025 May;32(5):2385-2391. doi: 10.1016/j.acra.2024.10.050. Epub 2024 Dec 9.

DOI:10.1016/j.acra.2024.10.050
PMID:39658474
Abstract

RATIONALE AND OBJECTIVES

We aimed to compare the capabilities of two leading large language models (LLMs), GPT-4 and Gemini, in analyzing serial radiology reports, to highlight oncological issues that require further clinical attention.

MATERIALS AND METHODS

This study included 205 patients, each with two consecutive radiological reports. We designed a prompt comprising a three-step task to analyze report findings using LLMs. To establish a ground truth, two radiologists reached a consensus on a six-level categorization, comprising tumor findings (categorized as improved, stable, or aggravated), "benign", "no tumor description," and "other malignancy." The performance of GPT-4 and Gemini was then compared based on their ability to match corresponding findings between two radiological reports and accurately reflect these categories.

RESULTS

In terms of accuracy in matching findings between serial reports, the proportion of correctly matched findings was significantly higher for GPT-4 (96.2%) than for Gemini (91.7%) (P < 0.01). For oncological issue identification, the precision for tumor-related finding determinations, recall, and F1-scores were 0.68 and 0.63 (P = 0.006), 0.91 and 0.80 (P < 0.001), and 0.78 and 0.70 for GPT-4 and Gemini, respectively. GPT-4 was more accurate than Gemini in determining the correct tumor status for tumor-related findings (P < 0.001).

CONCLUSION

This study demonstrated the potential of LLM-assisted analysis of serial radiology reports in enhancing oncological surveillance, using a carefully engineered prompt. GPT-4 showed superior performance compared to Gemini in matching corresponding findings, identifying tumor-related findings, and accurately determining tumor status.

摘要

原理与目的

我们旨在比较两种领先的大语言模型(LLMs)GPT-4和Gemini分析系列放射学报告的能力,以突出需要进一步临床关注的肿瘤学问题。

材料与方法

本研究纳入205例患者,每位患者有两份连续的放射学报告。我们设计了一个包含三步任务的提示,以使用大语言模型分析报告结果。为了建立一个基本事实,两名放射科医生就一个六级分类达成了共识,包括肿瘤结果(分类为改善、稳定或恶化)、“良性”、“无肿瘤描述”和“其他恶性肿瘤”。然后根据GPT-4和Gemini在两份放射学报告之间匹配相应结果并准确反映这些类别的能力来比较它们的性能。

结果

在系列报告之间匹配结果的准确性方面,GPT-4(96.2%)正确匹配结果的比例显著高于Gemini(91.7%)(P<0.01)。对于肿瘤学问题识别,GPT-4和Gemini在肿瘤相关发现确定的精度、召回率和F1分数分别为0.68和0.63(P=0.006)、0.91和0.80(P<0.001)以及0.78和0.70。在确定肿瘤相关发现的正确肿瘤状态方面,GPT-4比Gemini更准确(P<0.001)。

结论

本研究证明了使用精心设计的提示,大语言模型辅助分析系列放射学报告在加强肿瘤学监测方面的潜力。在匹配相应结果、识别肿瘤相关发现以及准确确定肿瘤状态方面,GPT-4表现出优于Gemini的性能。

相似文献

1
Enhancing Oncological Surveillance Through Large Language Model-Assisted Analysis: A Comparative Study of GPT-4 and Gemini in Evaluating Oncological Issues From Serial Abdominal CT Scan Reports.通过大语言模型辅助分析加强肿瘤监测:GPT-4与Gemini在评估腹部系列CT扫描报告中的肿瘤问题方面的比较研究
Acad Radiol. 2025 May;32(5):2385-2391. doi: 10.1016/j.acra.2024.10.050. Epub 2024 Dec 9.
2
Assessing the performance of ChatGPT and Bard/Gemini against radiologists for Prostate Imaging-Reporting and Data System classification based on prostate multiparametric MRI text reports.基于前列腺多参数MRI文本报告,评估ChatGPT和Bard/Gemini在前列腺影像报告和数据系统分类方面相对于放射科医生的性能。
Br J Radiol. 2025 Mar 1;98(1167):368-374. doi: 10.1093/bjr/tqae236.
3
Use of ChatGPT Large Language Models to Extract Details of Recommendations for Additional Imaging From Free-Text Impressions of Radiology Reports.使用ChatGPT大型语言模型从放射学报告的自由文本印象中提取额外影像学检查建议的详细信息。
AJR Am J Roentgenol. 2025 Apr;224(4):e2432341. doi: 10.2214/AJR.24.32341. Epub 2025 Jan 29.
4
Large Language Models for Automated Synoptic Reports and Resectability Categorization in Pancreatic Cancer.大语言模型在胰腺肿瘤自动化综述报告和可切除性分类中的应用。
Radiology. 2024 Jun;311(3):e233117. doi: 10.1148/radiol.233117.
5
Lung Cancer Staging Using Chest CT and FDG PET/CT Free-Text Reports: Comparison Among Three ChatGPT Large Language Models and Six Human Readers of Varying Experience.使用胸部CT和FDG PET/CT自由文本报告进行肺癌分期:三种ChatGPT大语言模型与六位不同经验水平的人类读者的比较
AJR Am J Roentgenol. 2024 Dec;223(6):e2431696. doi: 10.2214/AJR.24.31696. Epub 2024 Sep 4.
6
ChatGPT vs. Gemini: Comparative accuracy and efficiency in Lung-RADS score assignment from radiology reports.ChatGPT与Gemini:在根据放射学报告进行Lung-RADS评分分配中的准确性和效率比较
Clin Imaging. 2025 May;121:110455. doi: 10.1016/j.clinimag.2025.110455. Epub 2025 Mar 13.
7
Provision of Radiology Reports Simplified With Large Language Models to Patients With Cancer: Impact on Patient Satisfaction.利用大语言模型为癌症患者简化放射学报告:对患者满意度的影响
JCO Clin Cancer Inform. 2025 Jan;9:e2400166. doi: 10.1200/CCI-24-00166. Epub 2025 Jan 29.
8
Privacy-ensuring Open-weights Large Language Models Are Competitive with Closed-weights GPT-4o in Extracting Chest Radiography Findings from Free-Text Reports.在从自由文本报告中提取胸部X光检查结果方面,确保隐私的开放权重大型语言模型与封闭权重的GPT-4o具有竞争力。
Radiology. 2025 Jan;314(1):e240895. doi: 10.1148/radiol.240895.
9
Large Language Models for Simplified Interventional Radiology Reports: A Comparative Analysis.用于简化介入放射学报告的大语言模型:一项比较分析
Acad Radiol. 2025 Feb;32(2):888-898. doi: 10.1016/j.acra.2024.09.041. Epub 2024 Sep 30.
10
Evaluating AI proficiency in nuclear cardiology: Large language models take on the board preparation exam.评估人工智能在核心脏病学方面的熟练程度:大型语言模型参加资格考试。
J Nucl Cardiol. 2025 Mar;45:102089. doi: 10.1016/j.nuclcard.2024.102089. Epub 2024 Nov 29.

引用本文的文献

1
Systematic benchmarking of large Language models in programmed cell death-oriented gastric cancer research: a comparative analysis of DeepSeek‑V3, DeepSeek‑R1, and Claude 3.5.程序性细胞死亡导向的胃癌研究中大型语言模型的系统基准测试:DeepSeek-V3、DeepSeek-R1和Claude 3.5的比较分析
Discov Oncol. 2025 Jul 1;16(1):1227. doi: 10.1007/s12672-025-02911-7.