大语言模型在神经病理学中的诊断支持工具。

Large language models as a diagnostic support tool in neuropathology.

机构信息

Else Kröner Fresenius Center for Digital Health, Faculty of Medicine and University Hospital Carl Gustav Carus, TUD Dresden University of Technology, Dresden, Germany.

Department of Medicine II, Medical Faculty Mannheim, Heidelberg University, Mannheim, Germany., Heidelberg University, Mannheim, Germany.

出版信息

J Pathol Clin Res. 2024 Nov;10(6):e70009. doi: 10.1002/2056-4538.70009.

DOI:10.1002/2056-4538.70009

PMID:39505569

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC11540532/

Abstract

The WHO guidelines for classifying central nervous system (CNS) tumours are changing considerably with each release. The classification of CNS tumours is uniquely complex among most other solid tumours as it incorporates not just morphology, but also genetic and epigenetic features. Keeping current with these changes across medical fields can be challenging, even for clinical specialists. Large language models (LLMs) have demonstrated their ability to parse and process complex medical text, but their utility in neuro-oncology has not been systematically tested. We hypothesised that LLMs can effectively diagnose neuro-oncology cases from free-text histopathology reports according to the latest WHO guidelines. To test this hypothesis, we evaluated the performance of ChatGPT-4o, Claude-3.5-sonnet, and Llama3 across 30 challenging neuropathology cases, which each presented a complex mix of morphological and genetic information relevant to the diagnosis. Furthermore, we integrated these models with the latest WHO guidelines through Retrieval-Augmented Generation (RAG) and again assessed their diagnostic accuracy. Our data show that LLMs equipped with RAG, but not without RAG, can accurately diagnose the neuropathological tumour subtype in 90% of the tested cases. This study lays the groundwork for a new generation of computational tools that can assist neuropathologists in their daily reporting practice.

摘要

世界卫生组织（WHO）的中枢神经系统（CNS）肿瘤分类指南每次发布都会发生很大变化。CNS 肿瘤的分类在大多数实体肿瘤中是独一无二的，因为它不仅包含形态学特征，还包含遗传和表观遗传特征。即使对于临床专家来说，跟上医学领域的这些变化也具有挑战性。大型语言模型（LLM）已经证明了它们能够解析和处理复杂的医学文本，但它们在神经肿瘤学中的实用性尚未得到系统测试。我们假设 LLM 可以根据最新的 WHO 指南，从自由文本组织病理学报告中有效地诊断神经肿瘤病例。为了验证这一假设，我们评估了 ChatGPT-4o、Claude-3.5-sonnet 和 Llama3 在 30 个具有挑战性的神经病理学病例中的表现，每个病例都呈现出与诊断相关的形态学和遗传信息的复杂组合。此外，我们通过检索增强生成（RAG）将这些模型与最新的 WHO 指南集成，并再次评估它们的诊断准确性。我们的数据表明，配备 RAG 的 LLM 可以准确诊断 90%的测试病例中的神经病理肿瘤亚型，而没有配备 RAG 的 LLM 则无法做到这一点。这项研究为新一代计算工具奠定了基础，这些工具可以帮助神经病理学家在日常报告实践中。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/02c5/11540532/a1c3fe251f30/CJP2-10-e70009-g001.jpg

相似文献

Large language models as a diagnostic support tool in neuropathology.大语言模型在神经病理学中的诊断支持工具。

J Pathol Clin Res. 2024 Nov;10(6):e70009. doi: 10.1002/2056-4538.70009.

Evaluating text and visual diagnostic capabilities of large language models on questions related to the Breast Imaging Reporting and Data System Atlas 5 edition.评估大语言模型在与《乳腺影像报告和数据系统》第5版相关问题上的文本和视觉诊断能力。

Diagn Interv Radiol. 2025 Mar 3;31(2):111-129. doi: 10.4274/dir.2024.242876. Epub 2024 Sep 9.

Can large language models be new supportive tools in coronary computed tomography angiography reporting?大语言模型能否成为冠状动脉 CT 血管造影报告的新辅助工具？

Clin Imaging. 2024 Oct;114:110271. doi: 10.1016/j.clinimag.2024.110271. Epub 2024 Aug 31.

Assessing the feasibility of ChatGPT-4o and Claude 3-Opus in thyroid nodule classification based on ultrasound images.评估ChatGPT-4o和Claude 3-Opus基于超声图像进行甲状腺结节分类的可行性。

Endocrine. 2025 Mar;87(3):1041-1049. doi: 10.1007/s12020-024-04066-x. Epub 2024 Oct 11.

Development of a liver disease-specific large language model chat interface using retrieval-augmented generation.使用检索增强生成技术开发肝脏疾病特异性大语言模型聊天界面。

Hepatology. 2024 Nov 1;80(5):1158-1168. doi: 10.1097/HEP.0000000000000834. Epub 2024 Mar 7.

Optimizing large language models in digestive disease: strategies and challenges to improve clinical outcomes.优化消化疾病中的大语言模型：改善临床结局的策略和挑战。

Liver Int. 2024 Sep;44(9):2114-2124. doi: 10.1111/liv.15974. Epub 2024 May 31.

Assessing Retrieval-Augmented Large Language Model Performance in Emergency Department ICD-10-CM Coding Compared to Human Coders.与人工编码员相比，评估检索增强型大语言模型在急诊科ICD-10-CM编码中的性能。

medRxiv. 2024 Oct 17:2024.10.15.24315526. doi: 10.1101/2024.10.15.24315526.

Learning to Make Rare and Complex Diagnoses With Generative AI Assistance: Qualitative Study of Popular Large Language Models.利用生成式人工智能辅助学习罕见且复杂的诊断：对流行的大型语言模型的定性研究。

JMIR Med Educ. 2024 Feb 13;10:e51391. doi: 10.2196/51391.

Diagnostic performances of GPT-4o, Claude 3 Opus, and Gemini 1.5 Pro in "Diagnosis Please" cases.GPT-4o、Claude 3 Opus 和 Gemini 1.5 Pro 在“诊断请”案例中的诊断性能。

Jpn J Radiol. 2024 Nov;42(11):1231-1235. doi: 10.1007/s11604-024-01619-y. Epub 2024 Jul 1.

Diffuse gliomas to date and beyond 2016 WHO Classification of Tumours of the Central Nervous System.弥漫性胶质瘤：2016 年世界卫生组织中枢神经系统肿瘤分类之后的进展。

Int J Clin Oncol. 2020 Jun;25(6):997-1003. doi: 10.1007/s10147-020-01695-w. Epub 2020 May 28.

引用本文的文献

Large Language Models in Neurology Treatment Decision-Making: a Scoping Review.用于神经病学治疗决策的大语言模型：一项范围综述

J Med Syst. 2025 Sep 16;49(1):115. doi: 10.1007/s10916-025-02254-4.

Diagnostic Performance of ChatGPT-4.0 in Histopathological Analysis of Gliomas: A Single Institution Experience.ChatGPT-4.0在胶质瘤组织病理学分析中的诊断性能：单机构经验

Neuropathology. 2025 Aug;45(4):e70023. doi: 10.1111/neup.70023.

Large Language Models in Medical Diagnostics: Scoping Review With Bibliometric Analysis.医学诊断中的大语言模型：基于文献计量分析的综述

J Med Internet Res. 2025 Jun 9;27:e72062. doi: 10.2196/72062.

Authors' reply: Re: Koga et al. Retrieval-augmented generation versus document-grounded generation: a key distinction in large language models.作者回复：关于Koga等人的文章《检索增强生成与基于文档的生成：大语言模型中的关键区别》

J Pathol Clin Res. 2025 Jan;11(1):e70013. doi: 10.1002/2056-4538.70013.

Retrieval-augmented generation versus document-grounded generation: a key distinction in large language models.检索增强生成与基于文档的生成：大语言模型中的一个关键区别。

J Pathol Clin Res. 2025 Jan;11(1):e70014. doi: 10.1002/2056-4538.70014.

Improving large language model applications in biomedicine with retrieval-augmented generation: a systematic review, meta-analysis, and clinical development guidelines.利用检索增强生成改进生物医学中的大语言模型应用：一项系统综述、荟萃分析和临床开发指南

J Am Med Inform Assoc. 2025 Apr 1;32(4):605-615. doi: 10.1093/jamia/ocaf008.

Exploring the Potential of Claude 3 Opus in Renal Pathological Diagnosis: Performance Evaluation.探索 Claude 3 Opus 在肾脏病理诊断中的潜力：性能评估。

JMIR Med Inform. 2024 Nov 15;12:e65033. doi: 10.2196/65033.

本文引用的文献

Optimizing GPT-4 Turbo Diagnostic Accuracy in Neuroradiology through Prompt Engineering and Confidence Thresholds.通过提示工程和置信阈值优化GPT-4 Turbo在神经放射学中的诊断准确性。

Diagnostics (Basel). 2024 Jul 17;14(14):1541. doi: 10.3390/diagnostics14141541.

Evaluating the efficacy of few-shot learning for GPT-4Vision in neurodegenerative disease histopathology: A comparative analysis with convolutional neural network model.评估 GPT-4Vision 在神经退行性疾病组织病理学中少样本学习的效果：与卷积神经网络模型的比较分析。

Neuropathol Appl Neurobiol. 2024 Aug;50(4):e12997. doi: 10.1111/nan.12997.

Can Artificial Intelligence Mitigate Missed Diagnoses by Generating Differential Diagnoses for Neurosurgeons?人工智能能否通过生成神经外科医生的鉴别诊断来减少漏诊？

World Neurosurg. 2024 Jul;187:e1083-e1088. doi: 10.1016/j.wneu.2024.05.052. Epub 2024 May 16.

Optimization of hepatological clinical guidelines interpretation by large language models: a retrieval augmented generation-based framework.基于检索增强生成框架的大语言模型对肝病临床指南解读的优化

NPJ Digit Med. 2024 Apr 23;7(1):102. doi: 10.1038/s41746-024-01091-y.

The Application of Large Language Models for Radiologic Decision Making.大语言模型在放射学决策中的应用。

J Am Coll Radiol. 2024 Jul;21(7):1072-1078. doi: 10.1016/j.jacr.2024.01.007. Epub 2024 Jan 13.

CBTRUS Statistical Report: Primary Brain and Other Central Nervous System Tumors Diagnosed in the United States in 2016-2020.美国 2016-2020 年诊断的原发性脑和其他中枢神经系统肿瘤 CBTRUS 统计报告。

Neuro Oncol. 2023 Oct 4;25(12 Suppl 2):iv1-iv99. doi: 10.1093/neuonc/noad149.

Chat GPT as a Neuro-Score Calculator: Analysis of a Large Language Model's Performance on Various Neurological Exam Grading Scales.Chat GPT作为神经评分计算器：大型语言模型在各种神经学检查评分量表上的性能分析。

World Neurosurg. 2023 Nov;179:e342-e347. doi: 10.1016/j.wneu.2023.08.088. Epub 2023 Aug 26.

Evaluating the performance of large language models: ChatGPT and Google Bard in generating differential diagnoses in clinicopathological conferences of neurodegenerative disorders.评估大型语言模型的性能：ChatGPT 和 Google Bard 在神经退行性疾病临床病理会议中生成鉴别诊断的能力。

Brain Pathol. 2024 May;34(3):e13207. doi: 10.1111/bpa.13207. Epub 2023 Aug 8.

Use of Large Language Models to Predict Neuroimaging.大语言模型在神经影像学预测中的应用。

J Am Coll Radiol. 2023 Oct;20(10):1004-1009. doi: 10.1016/j.jacr.2023.06.008. Epub 2023 Jul 8.

Evaluating GPT as an Adjunct for Radiologic Decision Making: GPT-4 Versus GPT-3.5 in a Breast Imaging Pilot.评估 GPT 作为放射学决策辅助工具：GPT-4 与 GPT-3.5 在乳腺成像试点中的比较。

J Am Coll Radiol. 2023 Oct;20(10):990-997. doi: 10.1016/j.jacr.2023.05.003. Epub 2023 Jun 21.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

大语言模型在神经病理学中的诊断支持工具。

Large language models as a diagnostic support tool in neuropathology.

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献