Suppr超能文献

大语言模型用于肝细胞癌高危患者小肝结节的CEUS LI-RADS分类的可行性

Feasibility of large language models for CEUS LI-RADS categorization of small liver nodules in patients at risk for hepatocellular carcinoma.

作者信息

Huang Jiayan, Yang Rui, Huang Xiaotong, Zeng Keyu, Liu Yan, Luo Jun, Lyshchik Andrej, Lu Qiang

机构信息

West China Hospital of Sichuan University, Chengdu, China.

Department of Ultrasound, Affiliated Hospital of Panzhihua University, Panzhihua, China.

出版信息

Front Oncol. 2024 Dec 18;14:1513608. doi: 10.3389/fonc.2024.1513608. eCollection 2024.

Abstract

BACKGROUND

Large language models (LLMs) offer opportunities to enhance radiological applications, but their performance in handling complex tasks remains insufficiently investigated.

PURPOSE

To evaluate the performance of LLMs integrated with Contrast-enhanced Ultrasound Liver Imaging Reporting and Data System (CEUS LI-RADS) in diagnosing small (≤20mm) hepatocellular carcinoma (sHCC) in high-risk patients.

MATERIALS AND METHODS

From November 2014 to December 2023, high-risk HCC patients with untreated small (≤20mm) focal liver lesions (sFLLs), were included in this retrospective study. ChatGPT-4.0, ChatGPT-4o, ChatGPT-4o mini, and Google Gemini were integrated with imaging features from structured CEUS LI-RADS reports to assess their diagnostic performance for sHCC. The diagnostic efficacy of LLMs for small HCC were compared using McNemar test.

RESULTS

The final population consisted of 403 high-risk patients (52 years ± 11, 323 men). ChatGPT-4.0 and ChatGPT-4o demonstrated substantial to almost perfect intra-agreement for CEUS LI-RADS categorization (κ values: 0.76-1.0 and 0.7-0.94, respectively), outperforming ChatGPT-4o mini (κ values: 0.51-0.72) and Google Gemini (κ values: -0.04-0.47). ChatGPT-4.0 had higher sensitivity in detecting sHCC than ChatGPT-4o (83%-89% vs. 70%-78%, < 0.02) with comparable specificity (76%-90% vs. 83%-86%, > 0.05). Compared to human readers, ChatGPT-4.0 showed superior sensitivity (83%-89% vs. 63%-78%, < 0.004) and comparable specificity (76%-90% vs. 90%-95%, > 0.05) in diagnosing sHCC.

CONCLUSION

LLM integrated with CEUS LI-RADS offers potential tool in diagnosing sHCC for high-risk patients. ChatGPT-4.0 demonstrated satisfactory consistency in CEUS LI-RADS categorization, offering higher sensitivity in diagnosing sHCC while maintaining comparable specificity to that of human readers.

摘要

背景

大语言模型(LLMs)为增强放射学应用提供了机会,但其在处理复杂任务方面的性能仍未得到充分研究。

目的

评估集成了超声造影肝脏成像报告和数据系统(CEUS LI-RADS)的大语言模型在诊断高危患者小(≤20mm)肝细胞癌(sHCC)中的性能。

材料与方法

2014年11月至2023年12月,本回顾性研究纳入了患有未治疗的小(≤20mm)肝脏局灶性病变(sFLLs)的高危肝癌患者。ChatGPT-4.0、ChatGPT-4o、ChatGPT-4o mini和谷歌Gemini与结构化CEUS LI-RADS报告中的影像特征相结合,以评估它们对sHCC的诊断性能。使用McNemar检验比较大语言模型对小肝癌的诊断效能。

结果

最终研究人群包括403例高危患者(年龄52岁±11岁,男性323例)。ChatGPT-4.0和ChatGPT-4o在CEUS LI-RADS分类方面表现出高度一致至几乎完美的一致性(κ值分别为0.76 - 1.0和0.7 - 0.94),优于ChatGPT-4o mini(κ值为0.51 - 0.72)和谷歌Gemini(κ值为 - 0.04 - 0.47)。ChatGPT-4.0在检测sHCC方面的敏感性高于ChatGPT-4o(83% - 89%对70% - 78%,P < 0.02),特异性相当(76% - 90%对83% - 86%,P > 0.05)。与人类读者相比,ChatGPT-4.0在诊断sHCC时表现出更高的敏感性(83% - 89%对63% - 78%,P < 0.004)和相当的特异性(76% - 90%对90% - 95%,P > 0.05)。

结论

集成了CEUS LI-RADS的大语言模型为高危患者诊断sHCC提供了潜在工具。ChatGPT-4.0在CEUS LI-RADS分类方面表现出令人满意的一致性,在诊断sHCC时具有更高的敏感性,同时保持了与人类读者相当的特异性。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b5e6/11688206/c37ac2d9dd38/fonc-14-1513608-g001.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验