• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

使用大语言模型从非结构化肝胆病理报告中提取和分类结构化数据:与基于规则的自然语言处理的可行性比较研究

Extraction and classification of structured data from unstructured hepatobiliary pathology reports using large language models: a feasibility study compared with rules-based natural language processing.

作者信息

Geevarghese Ruben, Sigel Carlie, Cadley John, Chatterjee Subrata, Jain Pulkit, Hollingsworth Alex, Chatterjee Avijit, Swinburne Nathaniel, Bilal Khawaja Hasan, Marinelli Brett

机构信息

Division of Interventional Radiology, Department of Radiology, Memorial Sloan Kettering Cancer Center, New York, New York, USA.

Department of Pathology, Memorial Sloan Kettering Cancer Center, New York, New York, USA.

出版信息

J Clin Pathol. 2025 Jan 17;78(2):135-138. doi: 10.1136/jcp-2024-209669.

DOI:10.1136/jcp-2024-209669
PMID:39304201
Abstract

AIMS

Structured reporting in pathology is not universally adopted and extracting elements essential to research often requires expensive and time-intensive manual curation. The accuracy and feasibility of using large language models (LLMs) to extract essential pathology elements, for cancer research is examined here.

METHODS

Retrospective study of patients who underwent pathology sampling for suspected hepatocellular carcinoma and underwent Ytrrium-90 embolisation. Five pathology report elements of interest were included for evaluation. LLMs (Generative Pre-trained Transformer (GPT) 3.5 turbo and GPT-4) were used to extract elements of interest. For comparison, a rules-based, regular expressions (REGEX) approach was devised for extraction. Accuracy for each approach was calculated.

RESULTS

88 pathology reports were identified. LLMs and REGEX were both able to extract research elements with high accuracy (average 84.1%-94.8%).

CONCLUSIONS

LLMs have significant potential to simplify the extraction of research elements from pathology reporting, and therefore, accelerate the pace of cancer research.

摘要

目的

病理学中的结构化报告并未被普遍采用,提取研究所需的要素通常需要昂贵且耗时的人工整理。本文研究了使用大语言模型(LLMs)提取癌症研究所需病理学要素的准确性和可行性。

方法

对因疑似肝细胞癌接受病理采样并接受钇-90栓塞治疗的患者进行回顾性研究。纳入五个感兴趣的病理报告要素进行评估。使用大语言模型(生成式预训练变换器(GPT)3.5 turbo和GPT-4)提取感兴趣的要素。为作比较,设计了一种基于规则的正则表达式(REGEX)方法进行提取。计算每种方法的准确性。

结果

共识别出88份病理报告。大语言模型和正则表达式都能够高精度地提取研究要素(平均84.1%-94.8%)。

结论

大语言模型在简化从病理报告中提取研究要素方面具有巨大潜力,因此能够加快癌症研究的步伐。

相似文献

1
Extraction and classification of structured data from unstructured hepatobiliary pathology reports using large language models: a feasibility study compared with rules-based natural language processing.使用大语言模型从非结构化肝胆病理报告中提取和分类结构化数据:与基于规则的自然语言处理的可行性比较研究
J Clin Pathol. 2025 Jan 17;78(2):135-138. doi: 10.1136/jcp-2024-209669.
2
Large language models can accurately populate Vascular Quality Initiative procedural databases using narrative operative reports.大型语言模型可以使用手术记录准确填充血管质量倡议程序数据库。
J Vasc Surg. 2025 Apr;81(4):973-982. doi: 10.1016/j.jvs.2024.12.002. Epub 2024 Dec 16.
3
Using Large Language Models to Automate Data Extraction From Surgical Pathology Reports: Retrospective Cohort Study.使用大语言模型自动从外科病理报告中提取数据:回顾性队列研究。
JMIR Form Res. 2025 Apr 7;9:e64544. doi: 10.2196/64544.
4
Extracting structured information from unstructured histopathology reports using generative pre-trained transformer 4 (GPT-4).使用生成式预训练转换器 4(GPT-4)从非结构化组织病理学报告中提取结构化信息。
J Pathol. 2024 Mar;262(3):310-319. doi: 10.1002/path.6232. Epub 2023 Dec 14.
5
Use of ChatGPT Large Language Models to Extract Details of Recommendations for Additional Imaging From Free-Text Impressions of Radiology Reports.使用ChatGPT大型语言模型从放射学报告的自由文本印象中提取额外影像学检查建议的详细信息。
AJR Am J Roentgenol. 2025 Apr;224(4):e2432341. doi: 10.2214/AJR.24.32341. Epub 2025 Jan 29.
6
Large language models for data extraction from unstructured and semi-structured electronic health records: a multiple model performance evaluation.用于从非结构化和半结构化电子健康记录中提取数据的大语言模型:多模型性能评估
BMJ Health Care Inform. 2025 Jan 19;32(1):e101139. doi: 10.1136/bmjhci-2024-101139.
7
Engineering of Generative Artificial Intelligence and Natural Language Processing Models to Accurately Identify Arrhythmia Recurrence.用于准确识别心律失常复发的生成式人工智能和自然语言处理模型的工程设计。
Circ Arrhythm Electrophysiol. 2025 Jan;18(1):e013023. doi: 10.1161/CIRCEP.124.013023. Epub 2024 Dec 16.
8
Improving entity recognition using ensembles of deep learning and fine-tuned large language models: A case study on adverse event extraction from VAERS and social media.使用深度学习集成和微调大语言模型改进实体识别:以从VAERS和社交媒体中提取不良事件为例
J Biomed Inform. 2025 Mar;163:104789. doi: 10.1016/j.jbi.2025.104789. Epub 2025 Feb 7.
9
Extracting lung cancer staging descriptors from pathology reports: A generative language model approach.从病理报告中提取肺癌分期描述符:一种生成式语言模型方法。
J Biomed Inform. 2024 Sep;157:104720. doi: 10.1016/j.jbi.2024.104720. Epub 2024 Sep 2.
10
Programming techniques for improving rule readability for rule-based information extraction natural language processing pipelines of unstructured and semi-structured medical texts.用于改进基于规则的信息抽取自然语言处理管道的规则可读性的编程技术,这些管道处理非结构化和半结构化的医学文本。
Health Informatics J. 2023 Apr-Jun;29(2):14604582231164696. doi: 10.1177/14604582231164696.

引用本文的文献

1
Data Extraction and Curation from Radiology Reports for Pancreatic Cyst Surveillance Using Large Language Models.使用大语言模型从放射学报告中提取和整理胰腺囊肿监测数据
J Am Coll Surg. 2025 Jul 10. doi: 10.1097/XCS.0000000000001478.
2
Responsible Artificial Intelligence governance in oncology.肿瘤学中的负责任人工智能治理
NPJ Digit Med. 2025 Jul 4;8(1):407. doi: 10.1038/s41746-025-01794-w.