Else Kröner Fresenius Center for Digital Health, Faculty of Medicine and University Hospital Carl Gustav Carus, TUD Dresden University of Technology, Dresden, Germany.
Department of Medicine II, Medical Faculty Mannheim, Heidelberg University, Mannheim, Germany., Heidelberg University, Mannheim, Germany.
J Pathol Clin Res. 2024 Nov;10(6):e70009. doi: 10.1002/2056-4538.70009.
The WHO guidelines for classifying central nervous system (CNS) tumours are changing considerably with each release. The classification of CNS tumours is uniquely complex among most other solid tumours as it incorporates not just morphology, but also genetic and epigenetic features. Keeping current with these changes across medical fields can be challenging, even for clinical specialists. Large language models (LLMs) have demonstrated their ability to parse and process complex medical text, but their utility in neuro-oncology has not been systematically tested. We hypothesised that LLMs can effectively diagnose neuro-oncology cases from free-text histopathology reports according to the latest WHO guidelines. To test this hypothesis, we evaluated the performance of ChatGPT-4o, Claude-3.5-sonnet, and Llama3 across 30 challenging neuropathology cases, which each presented a complex mix of morphological and genetic information relevant to the diagnosis. Furthermore, we integrated these models with the latest WHO guidelines through Retrieval-Augmented Generation (RAG) and again assessed their diagnostic accuracy. Our data show that LLMs equipped with RAG, but not without RAG, can accurately diagnose the neuropathological tumour subtype in 90% of the tested cases. This study lays the groundwork for a new generation of computational tools that can assist neuropathologists in their daily reporting practice.
世界卫生组织(WHO)的中枢神经系统(CNS)肿瘤分类指南每次发布都会发生很大变化。CNS 肿瘤的分类在大多数实体肿瘤中是独一无二的,因为它不仅包含形态学特征,还包含遗传和表观遗传特征。即使对于临床专家来说,跟上医学领域的这些变化也具有挑战性。大型语言模型(LLM)已经证明了它们能够解析和处理复杂的医学文本,但它们在神经肿瘤学中的实用性尚未得到系统测试。我们假设 LLM 可以根据最新的 WHO 指南,从自由文本组织病理学报告中有效地诊断神经肿瘤病例。为了验证这一假设,我们评估了 ChatGPT-4o、Claude-3.5-sonnet 和 Llama3 在 30 个具有挑战性的神经病理学病例中的表现,每个病例都呈现出与诊断相关的形态学和遗传信息的复杂组合。此外,我们通过检索增强生成(RAG)将这些模型与最新的 WHO 指南集成,并再次评估它们的诊断准确性。我们的数据表明,配备 RAG 的 LLM 可以准确诊断 90%的测试病例中的神经病理肿瘤亚型,而没有配备 RAG 的 LLM 则无法做到这一点。这项研究为新一代计算工具奠定了基础,这些工具可以帮助神经病理学家在日常报告实践中。