文献检索，用中文搜 PubMed

Extraction and classification of structured data from unstructured hepatobiliary pathology reports using large language models: a feasibility study compared with rules-based natural language processing.

作者信息

Geevarghese Ruben, Sigel Carlie, Cadley John, Chatterjee Subrata, Jain Pulkit, Hollingsworth Alex, Chatterjee Avijit, Swinburne Nathaniel, Bilal Khawaja Hasan, Marinelli Brett

机构信息

Division of Interventional Radiology, Department of Radiology, Memorial Sloan Kettering Cancer Center, New York, New York, USA.

Department of Pathology, Memorial Sloan Kettering Cancer Center, New York, New York, USA.

出版信息

J Clin Pathol. 2025 Jan 17;78(2):135-138. doi: 10.1136/jcp-2024-209669.

AIMS

Structured reporting in pathology is not universally adopted and extracting elements essential to research often requires expensive and time-intensive manual curation. The accuracy and feasibility of using large language models (LLMs) to extract essential pathology elements, for cancer research is examined here.

METHODS

Retrospective study of patients who underwent pathology sampling for suspected hepatocellular carcinoma and underwent Ytrrium-90 embolisation. Five pathology report elements of interest were included for evaluation. LLMs (Generative Pre-trained Transformer (GPT) 3.5 turbo and GPT-4) were used to extract elements of interest. For comparison, a rules-based, regular expressions (REGEX) approach was devised for extraction. Accuracy for each approach was calculated.

RESULTS

88 pathology reports were identified. LLMs and REGEX were both able to extract research elements with high accuracy (average 84.1%-94.8%).

CONCLUSIONS

LLMs have significant potential to simplify the extraction of research elements from pathology reporting, and therefore, accelerate the pace of cancer research.

目的

病理学中的结构化报告并未被普遍采用，提取研究所需的要素通常需要昂贵且耗时的人工整理。本文研究了使用大语言模型（LLMs）提取癌症研究所需病理学要素的准确性和可行性。

方法

对因疑似肝细胞癌接受病理采样并接受钇-90栓塞治疗的患者进行回顾性研究。纳入五个感兴趣的病理报告要素进行评估。使用大语言模型（生成式预训练变换器（GPT）3.5 turbo和GPT-4）提取感兴趣的要素。为作比较，设计了一种基于规则的正则表达式（REGEX）方法进行提取。计算每种方法的准确性。

结果

共识别出88份病理报告。大语言模型和正则表达式都能够高精度地提取研究要素（平均84.1%-94.8%）。

结论

大语言模型在简化从病理报告中提取研究要素方面具有巨大潜力，因此能够加快癌症研究的步伐。

Suppr 超能文献

文献检索

文件翻译

深度研究

Suppr 超能文献

文献检索

文件翻译

深度研究

使用大语言模型从非结构化肝胆病理报告中提取和分类结构化数据：与基于规则的自然语言处理的可行性比较研究

Extraction and classification of structured data from unstructured hepatobiliary pathology reports using large language models: a feasibility study compared with rules-based natural language processing.

作者信息

机构信息

出版信息

AIMS

METHODS

RESULTS

CONCLUSIONS

相似文献

引用本文的文献

使用大语言模型从非结构化肝胆病理报告中提取和分类结构化数据：与基于规则的自然语言处理的可行性比较研究

Extraction and classification of structured data from unstructured hepatobiliary pathology reports using large language models: a feasibility study compared with rules-based natural language processing.

作者信息

机构信息

出版信息

AIMS

METHODS

RESULTS

CONCLUSIONS

目的

方法

结果

结论

相似文献

引用本文的文献