Geevarghese Ruben, Sigel Carlie, Cadley John, Chatterjee Subrata, Jain Pulkit, Hollingsworth Alex, Chatterjee Avijit, Swinburne Nathaniel, Bilal Khawaja Hasan, Marinelli Brett
Division of Interventional Radiology, Department of Radiology, Memorial Sloan Kettering Cancer Center, New York, New York, USA.
Department of Pathology, Memorial Sloan Kettering Cancer Center, New York, New York, USA.
J Clin Pathol. 2025 Jan 17;78(2):135-138. doi: 10.1136/jcp-2024-209669.
Structured reporting in pathology is not universally adopted and extracting elements essential to research often requires expensive and time-intensive manual curation. The accuracy and feasibility of using large language models (LLMs) to extract essential pathology elements, for cancer research is examined here.
Retrospective study of patients who underwent pathology sampling for suspected hepatocellular carcinoma and underwent Ytrrium-90 embolisation. Five pathology report elements of interest were included for evaluation. LLMs (Generative Pre-trained Transformer (GPT) 3.5 turbo and GPT-4) were used to extract elements of interest. For comparison, a rules-based, regular expressions (REGEX) approach was devised for extraction. Accuracy for each approach was calculated.
88 pathology reports were identified. LLMs and REGEX were both able to extract research elements with high accuracy (average 84.1%-94.8%).
LLMs have significant potential to simplify the extraction of research elements from pathology reporting, and therefore, accelerate the pace of cancer research.
病理学中的结构化报告并未被普遍采用,提取研究所需的要素通常需要昂贵且耗时的人工整理。本文研究了使用大语言模型(LLMs)提取癌症研究所需病理学要素的准确性和可行性。
对因疑似肝细胞癌接受病理采样并接受钇-90栓塞治疗的患者进行回顾性研究。纳入五个感兴趣的病理报告要素进行评估。使用大语言模型(生成式预训练变换器(GPT)3.5 turbo和GPT-4)提取感兴趣的要素。为作比较,设计了一种基于规则的正则表达式(REGEX)方法进行提取。计算每种方法的准确性。
共识别出88份病理报告。大语言模型和正则表达式都能够高精度地提取研究要素(平均84.1%-94.8%)。
大语言模型在简化从病理报告中提取研究要素方面具有巨大潜力,因此能够加快癌症研究的步伐。