• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

具有检索增强生成功能的大型语言模型NotebookLM在肺癌分期中的应用。

Application of NotebookLM, a large language model with retrieval-augmented generation, for lung cancer staging.

作者信息

Tozuka Ryota, Johno Hisashi, Amakawa Akitomo, Sato Junichi, Muto Mizuki, Seki Shoichiro, Komaba Atsushi, Onishi Hiroshi

机构信息

Department of Radiology, University of Yamanashi, 1110 Shimokato, Chuo, Yamanashi, 409-3898, Japan.

Department of Radiation Oncology, Tohoku University Graduate School of Medicine, 1-1 Seiryo-machi, Aoba-ku, Sendai, Miyagi, 980-8574, Japan.

出版信息

Jpn J Radiol. 2025 Apr;43(4):706-712. doi: 10.1007/s11604-024-01705-1. Epub 2024 Nov 25.

DOI:10.1007/s11604-024-01705-1
PMID:39585559
Abstract

PURPOSE

In radiology, large language models (LLMs), including ChatGPT, have recently gained attention, and their utility is being rapidly evaluated. However, concerns have emerged regarding their reliability in clinical applications due to limitations such as hallucinations and insufficient referencing. To address these issues, we focus on the latest technology, retrieval-augmented generation (RAG), which enables LLMs to reference reliable external knowledge (REK). Specifically, this study examines the utility and reliability of a recently released RAG-equipped LLM (RAG-LLM), NotebookLM, for staging lung cancer.

MATERIALS AND METHODS

We summarized the current lung cancer staging guideline in Japan and provided this as REK to NotebookLM. We then tasked NotebookLM with staging 100 fictional lung cancer cases based on CT findings and evaluated its accuracy. For comparison, we performed the same task using a gold-standard LLM, GPT-4 Omni (GPT-4o), both with and without the REK. For GPT-4o, the REK was provided directly within the prompt rather than through RAG.

RESULTS

NotebookLM achieved 86% diagnostic accuracy in the lung cancer staging experiment, outperforming GPT-4o, which recorded 39% accuracy with the REK and 25% without it. Moreover, NotebookLM demonstrated 95% accuracy in searching reference locations within the REK.

CONCLUSION

NotebookLM, a RAG-LLM, successfully performed lung cancer staging by utilizing the REK, demonstrating superior performance compared to GPT-4o (without RAG). Additionally, it provided highly accurate reference locations within the REK, allowing radiologists to efficiently evaluate the reliability of NotebookLM's responses and detect possible hallucinations. Overall, this study highlights the potential of NotebookLM, a RAG-LLM, in image diagnosis.

摘要

目的

在放射学领域,包括ChatGPT在内的大语言模型(LLMs)最近受到了关注,其效用正在迅速得到评估。然而,由于诸如幻觉和参考文献不足等局限性,人们对其在临床应用中的可靠性产生了担忧。为了解决这些问题,我们聚焦于最新技术——检索增强生成(RAG),它能使大语言模型参考可靠的外部知识(REK)。具体而言,本研究考察了最近发布的配备RAG的大语言模型(RAG-LLM)NotebookLM在肺癌分期方面的效用和可靠性。

材料与方法

我们总结了日本当前的肺癌分期指南,并将其作为可靠外部知识提供给NotebookLM。然后,我们要求NotebookLM根据CT检查结果对100例虚构的肺癌病例进行分期,并评估其准确性。为作比较,我们使用金标准大语言模型GPT-4 Omni(GPT-4o)执行相同任务,分别提供和不提供可靠外部知识。对于GPT-4o,可靠外部知识是直接在提示中提供的,而非通过RAG。

结果

在肺癌分期实验中,NotebookLM的诊断准确率达到86%,优于GPT-4o,GPT-4o在提供可靠外部知识时准确率为39%,不提供时为25%。此外,NotebookLM在可靠外部知识中搜索参考位置的准确率为95%。

结论

RAG-LLM NotebookLM通过利用可靠外部知识成功进行了肺癌分期,与GPT-4o(无RAG)相比表现出卓越性能。此外,它在可靠外部知识中提供了高度准确的参考位置,使放射科医生能够有效地评估NotebookLM回答的可靠性并检测可能的幻觉。总体而言,本研究凸显了RAG-LLM NotebookLM在图像诊断中的潜力。

相似文献

1
Application of NotebookLM, a large language model with retrieval-augmented generation, for lung cancer staging.具有检索增强生成功能的大型语言模型NotebookLM在肺癌分期中的应用。
Jpn J Radiol. 2025 Apr;43(4):706-712. doi: 10.1007/s11604-024-01705-1. Epub 2024 Nov 25.
2
Lung Cancer Staging Using Chest CT and FDG PET/CT Free-Text Reports: Comparison Among Three ChatGPT Large Language Models and Six Human Readers of Varying Experience.使用胸部CT和FDG PET/CT自由文本报告进行肺癌分期:三种ChatGPT大语言模型与六位不同经验水平的人类读者的比较
AJR Am J Roentgenol. 2024 Dec;223(6):e2431696. doi: 10.2214/AJR.24.31696. Epub 2024 Sep 4.
3
Evaluation of the integration of retrieval-augmented generation in large language model for breast cancer nursing care responses.用于乳腺癌护理响应的大型语言模型中检索增强生成集成的评估。
Sci Rep. 2024 Dec 28;14(1):30794. doi: 10.1038/s41598-024-81052-3.
4
AI in Home Care-Evaluation of Large Language Models for Future Training of Informal Caregivers: Observational Comparative Case Study.家庭护理中的人工智能——对用于未来非正式护理人员培训的大语言模型的评估:观察性比较案例研究
J Med Internet Res. 2025 Apr 28;27:e70703. doi: 10.2196/70703.
5
Use of Retrieval-Augmented Large Language Model for COVID-19 Fact-Checking: Development and Usability Study.使用检索增强大语言模型进行COVID-19事实核查:开发与可用性研究。
J Med Internet Res. 2025 Apr 30;27:e66098. doi: 10.2196/66098.
6
Large language models in methodological quality evaluation of radiomics research based on METRICS: ChatGPT vs NotebookLM vs radiologist.基于METRICS的影像组学研究方法学质量评估中的大语言模型:ChatGPT与NotebookLM对比放射科医生
Eur J Radiol. 2025 Mar;184:111960. doi: 10.1016/j.ejrad.2025.111960. Epub 2025 Jan 29.
7
Evaluating Adherence to Canadian Radiology Guidelines for Incidental Hepatobiliary Findings Using RAG-Enabled LLMs.使用支持RAG的大语言模型评估对加拿大肝胆偶发发现放射学指南的依从性
Can Assoc Radiol J. 2025 Feb 27:8465371251323124. doi: 10.1177/08465371251323124.
8
Retrieval-augmented generation enhances large language model performance on the Japanese orthopedic board examination.检索增强生成提高了大型语言模型在日本骨科医师资格考试中的表现。
J Orthop Sci. 2025 Mar 28. doi: 10.1016/j.jos.2025.03.003.
9
Custom Large Language Models Improve Accuracy: Comparing Retrieval Augmented Generation and Artificial Intelligence Agents to Noncustom Models for Evidence-Based Medicine.定制大语言模型提高准确性:将检索增强生成和人工智能代理与非定制模型在循证医学方面进行比较
Arthroscopy. 2025 Mar;41(3):565-573.e6. doi: 10.1016/j.arthro.2024.10.042. Epub 2024 Nov 7.
10
Improving accuracy of GPT-3/4 results on biomedical data using a retrieval-augmented language model.使用检索增强语言模型提高GPT-3/4在生物医学数据上的结果准确性。
PLOS Digit Health. 2024 Aug 21;3(8):e0000568. doi: 10.1371/journal.pdig.0000568. eCollection 2024 Aug.

引用本文的文献

1
Large language models in clinical nutrition: an overview of its applications, capabilities, limitations, and potential future prospects.临床营养中的大语言模型:其应用、能力、局限性及潜在未来前景概述
Front Nutr. 2025 Aug 7;12:1635682. doi: 10.3389/fnut.2025.1635682. eCollection 2025.
2
Machine learning approaches for EGFR mutation status prediction in NSCLC: an updated systematic review.用于非小细胞肺癌中表皮生长因子受体突变状态预测的机器学习方法:一项更新的系统评价
Front Oncol. 2025 Jul 10;15:1576461. doi: 10.3389/fonc.2025.1576461. eCollection 2025.
3
Chatbots in Radiology: Current Applications, Limitations and Future Directions of ChatGPT in Medical Imaging.

本文引用的文献

1
Generation of short-term follow-up chest CT images using a latent diffusion model in COVID-19.在新冠肺炎中使用潜在扩散模型生成短期随访胸部CT图像
Jpn J Radiol. 2025 Apr;43(4):622-633. doi: 10.1007/s11604-024-01699-w. Epub 2024 Nov 25.
2
Exploring Multilingual Large Language Models for Enhanced TNM Classification of Radiology Report in Lung Cancer Staging.探索多语言大语言模型以增强肺癌分期中放射学报告的TNM分类
Cancers (Basel). 2024 Oct 26;16(21):3621. doi: 10.3390/cancers16213621.
3
Editorial Comment: Artificial Intelligence in the Analysis of Radiology Reports-Ready to Take the Stage?
放射学中的聊天机器人:ChatGPT在医学成像中的当前应用、局限性及未来方向
Diagnostics (Basel). 2025 Jun 26;15(13):1635. doi: 10.3390/diagnostics15131635.
4
Utilizing AI-Powered Thematic Analysis: Methodology, Implementation, and Lessons Learned.利用人工智能驱动的主题分析:方法、实施与经验教训。
Cureus. 2025 Jun 4;17(6):e85338. doi: 10.7759/cureus.85338. eCollection 2025 Jun.
5
Evaluation of retrieval-augmented generation and large language models in clinical guidelines for degenerative spine conditions.在退行性脊柱疾病临床指南中对检索增强生成和大语言模型的评估。
Eur Spine J. 2025 Jul 7. doi: 10.1007/s00586-025-08994-8.
6
Large Language Models in Cancer Imaging: Applications and Future Perspectives.癌症成像中的大语言模型:应用与未来展望。
J Clin Med. 2025 May 8;14(10):3285. doi: 10.3390/jcm14103285.
编辑评论:人工智能在放射学报告分析中——准备好登上舞台了吗?
AJR Am J Roentgenol. 2024 Dec;223(6):e2432057. doi: 10.2214/AJR.24.32057. Epub 2024 Sep 25.
4
Assessing knowledge about medical physics in language-generative AI with large language model: using the medical physicist exam.用大型语言模型评估语言生成式人工智能中关于医学物理学的知识:使用医学物理学家考试。
Radiol Phys Technol. 2024 Dec;17(4):929-937. doi: 10.1007/s12194-024-00838-2. Epub 2024 Sep 10.
5
GastroBot: a Chinese gastrointestinal disease chatbot based on the retrieval-augmented generation.GastroBot:一个基于检索增强生成技术的中文胃肠疾病聊天机器人。
Front Med (Lausanne). 2024 May 22;11:1392555. doi: 10.3389/fmed.2024.1392555. eCollection 2024.
6
Building Trustworthy Generative Artificial Intelligence for Diabetes Care and Limb Preservation: A Medical Knowledge Extraction Case.为糖尿病护理和肢体保全构建可信的生成式人工智能:一个医学知识提取案例。
J Diabetes Sci Technol. 2024 May 20:19322968241253568. doi: 10.1177/19322968241253568.
7
ChatGPT in radiology: A systematic review of performance, pitfalls, and future perspectives.ChatGPT 在放射学中的应用:性能、陷阱及未来展望的系统评价。
Diagn Interv Imaging. 2024 Jul-Aug;105(7-8):251-265. doi: 10.1016/j.diii.2024.04.003. Epub 2024 Apr 27.
8
Optimization of hepatological clinical guidelines interpretation by large language models: a retrieval augmented generation-based framework.基于检索增强生成框架的大语言模型对肝病临床指南解读的优化
NPJ Digit Med. 2024 Apr 23;7(1):102. doi: 10.1038/s41746-024-01091-y.
9
Integrating Retrieval-Augmented Generation with Large Language Models in Nephrology: Advancing Practical Applications.将检索增强生成与大型语言模型在肾脏病学中的整合:推进实际应用。
Medicina (Kaunas). 2024 Mar 8;60(3):445. doi: 10.3390/medicina60030445.