Suppr超能文献

通过大语言模型辅助分析加强肿瘤监测:GPT-4与Gemini在评估腹部系列CT扫描报告中的肿瘤问题方面的比较研究

Enhancing Oncological Surveillance Through Large Language Model-Assisted Analysis: A Comparative Study of GPT-4 and Gemini in Evaluating Oncological Issues From Serial Abdominal CT Scan Reports.

作者信息

Han Na Yeon, Shin Keewon, Kim Min Ju, Park Beom Jin, Sim Ki Choon, Han Yeo Eun, Sung Deuk Jae, Choi Jae Woong, Yeom Suk Keu

机构信息

Department of Radiology, Korea University Anam Hospital, Korea University College of Medicine, 73 Goryeodae-ro, Seongbuk-gu, Seoul, Republic of Korea (N.Y.H., M.J.K., B.J.P., K.C.S., Y.E.H., D.J.S.).

Center for AI and Digital Healthcare Research, Korea University Anam Hospital, Korea University College of Medicine, Seoul, Republic of Korea (K.S.).

出版信息

Acad Radiol. 2025 May;32(5):2385-2391. doi: 10.1016/j.acra.2024.10.050. Epub 2024 Dec 9.

Abstract

RATIONALE AND OBJECTIVES

We aimed to compare the capabilities of two leading large language models (LLMs), GPT-4 and Gemini, in analyzing serial radiology reports, to highlight oncological issues that require further clinical attention.

MATERIALS AND METHODS

This study included 205 patients, each with two consecutive radiological reports. We designed a prompt comprising a three-step task to analyze report findings using LLMs. To establish a ground truth, two radiologists reached a consensus on a six-level categorization, comprising tumor findings (categorized as improved, stable, or aggravated), "benign", "no tumor description," and "other malignancy." The performance of GPT-4 and Gemini was then compared based on their ability to match corresponding findings between two radiological reports and accurately reflect these categories.

RESULTS

In terms of accuracy in matching findings between serial reports, the proportion of correctly matched findings was significantly higher for GPT-4 (96.2%) than for Gemini (91.7%) (P < 0.01). For oncological issue identification, the precision for tumor-related finding determinations, recall, and F1-scores were 0.68 and 0.63 (P = 0.006), 0.91 and 0.80 (P < 0.001), and 0.78 and 0.70 for GPT-4 and Gemini, respectively. GPT-4 was more accurate than Gemini in determining the correct tumor status for tumor-related findings (P < 0.001).

CONCLUSION

This study demonstrated the potential of LLM-assisted analysis of serial radiology reports in enhancing oncological surveillance, using a carefully engineered prompt. GPT-4 showed superior performance compared to Gemini in matching corresponding findings, identifying tumor-related findings, and accurately determining tumor status.

摘要

原理与目的

我们旨在比较两种领先的大语言模型(LLMs)GPT-4和Gemini分析系列放射学报告的能力,以突出需要进一步临床关注的肿瘤学问题。

材料与方法

本研究纳入205例患者,每位患者有两份连续的放射学报告。我们设计了一个包含三步任务的提示,以使用大语言模型分析报告结果。为了建立一个基本事实,两名放射科医生就一个六级分类达成了共识,包括肿瘤结果(分类为改善、稳定或恶化)、“良性”、“无肿瘤描述”和“其他恶性肿瘤”。然后根据GPT-4和Gemini在两份放射学报告之间匹配相应结果并准确反映这些类别的能力来比较它们的性能。

结果

在系列报告之间匹配结果的准确性方面,GPT-4(96.2%)正确匹配结果的比例显著高于Gemini(91.7%)(P<0.01)。对于肿瘤学问题识别,GPT-4和Gemini在肿瘤相关发现确定的精度、召回率和F1分数分别为0.68和0.63(P=0.006)、0.91和0.80(P<0.001)以及0.78和0.70。在确定肿瘤相关发现的正确肿瘤状态方面,GPT-4比Gemini更准确(P<0.001)。

结论

本研究证明了使用精心设计的提示,大语言模型辅助分析系列放射学报告在加强肿瘤学监测方面的潜力。在匹配相应结果、识别肿瘤相关发现以及准确确定肿瘤状态方面,GPT-4表现出优于Gemini的性能。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验