• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

开发和验证一种从病理报告中提取乳腺癌临床和病理特征的自然语言处理算法。

Development and Validation of a Natural Language Processing Algorithm for Extracting Clinical and Pathological Features of Breast Cancer From Pathology Reports.

机构信息

Division of Medical Senology, European Institute of Oncology IRCCS, Milan, Italy.

Division of Early Drug Development for Innovative Therapies, European Institute of Oncology IRCCS, Milan, Italy.

出版信息

JCO Clin Cancer Inform. 2024 Aug;8:e2400034. doi: 10.1200/CCI.24.00034.

DOI:10.1200/CCI.24.00034
PMID:39137368
Abstract

PURPOSE

Electronic health records (EHRs) are valuable information repositories that offer insights for enhancing clinical research on breast cancer (BC) using real-world data. The objective of this study was to develop a natural language processing (NLP) model specifically designed to extract structured data from BC pathology reports written in natural language.

METHODS

During the initial phase, the algorithm's development cohort comprised 193 pathology reports from 116 patients with BC from 2012 to 2016. A rule-based NLP algorithm was applied to extract 26 variables for analysis and was compared with the manual extraction of data performed by both a data entry specialist and an oncologist. Following the first approach, the data set was expanded to include 513 reports, and a Named Entity Recognition (NER)-NLP model was trained and evaluated using K-fold cross-validation.

RESULTS

The first approach led to a concordance analysis, which revealed an 82.9% agreement between the algorithm and the oncologist, whereas the concordance between the data entry specialist and the oncologist was 90.8%. The second training approach introduced the definition of an NER-NLP model, in which the accuracy showed remarkable potential (97.8%). Notably, the model demonstrated remarkable performance, especially for parameters such as estrogen receptor, progesterone receptor, human epidermal growth factor receptor 2, and Ki-67 (F1-score 1.0).

CONCLUSION

The present study aligns with the rapidly evolving field of artificial intelligence (AI) applications in oncology, seeking to expedite the development of complex cancer databases and registries. The results of the model are currently undergoing postprocessing procedures to organize the data into tabular structures, facilitating their utilization in real-world clinical and research endeavors.

摘要

目的

电子健康记录(EHR)是有价值的信息库,可利用真实世界的数据来提高乳腺癌(BC)的临床研究水平。本研究的目的是开发一种专门用于从自然语言书写的 BC 病理报告中提取结构化数据的自然语言处理(NLP)模型。

方法

在初始阶段,算法的开发队列包括 2012 年至 2016 年间 116 名 BC 患者的 193 份病理报告。应用基于规则的 NLP 算法提取 26 个变量进行分析,并与数据录入专家和肿瘤学家手动提取数据进行比较。采用第一种方法后,数据集扩展到 513 份报告,并使用 K 折交叉验证训练和评估命名实体识别(NER)-NLP 模型。

结果

第一种方法进行了一致性分析,结果显示算法与肿瘤学家之间的一致性为 82.9%,而数据录入专家与肿瘤学家之间的一致性为 90.8%。第二种训练方法引入了 NER-NLP 模型的定义,其中准确率显示出显著的潜力(97.8%)。值得注意的是,该模型的表现非常出色,尤其是在雌激素受体、孕激素受体、人表皮生长因子受体 2 和 Ki-67 等参数方面(F1 得分为 1.0)。

结论

本研究与人工智能(AI)在肿瘤学中的应用快速发展的领域相吻合,旨在加速复杂癌症数据库和注册库的开发。目前正在对模型的结果进行后处理程序,将数据组织成表格结构,以便在现实临床和研究工作中使用。

相似文献

1
Development and Validation of a Natural Language Processing Algorithm for Extracting Clinical and Pathological Features of Breast Cancer From Pathology Reports.开发和验证一种从病理报告中提取乳腺癌临床和病理特征的自然语言处理算法。
JCO Clin Cancer Inform. 2024 Aug;8:e2400034. doi: 10.1200/CCI.24.00034.
2
Extracting lung cancer staging descriptors from pathology reports: A generative language model approach.从病理报告中提取肺癌分期描述符:一种生成式语言模型方法。
J Biomed Inform. 2024 Sep;157:104720. doi: 10.1016/j.jbi.2024.104720. Epub 2024 Sep 2.
3
Mining Clinical Notes for Physical Rehabilitation Exercise Information: Natural Language Processing Algorithm Development and Validation Study.挖掘临床记录中的物理康复锻炼信息:自然语言处理算法的开发与验证研究
JMIR Med Inform. 2024 Apr 3;12:e52289. doi: 10.2196/52289.
4
Leveraging Rule-Based NLP to Translate Textual Reports as Structured Inputs Automatically Processed by a Clinical Decision Support System.利用基于规则的自然语言处理技术自动将文本报告转换为临床决策支持系统可处理的结构化输入。
Stud Health Technol Inform. 2024 Aug 22;316:1861-1865. doi: 10.3233/SHTI240794.
5
Correlating mammographic and pathologic findings in clinical decision support using natural language processing and data mining methods.在临床决策支持中使用自然语言处理和数据挖掘方法关联乳腺钼靶检查和病理检查结果。
Cancer. 2017 Jan 1;123(1):114-121. doi: 10.1002/cncr.30245. Epub 2016 Aug 29.
6
Development and Validation of a Model to Identify Critical Brain Injuries Using Natural Language Processing of Text Computed Tomography Reports.利用文本计算机断层扫描报告的自然语言处理开发和验证一种识别关键脑损伤的模型。
JAMA Netw Open. 2022 Aug 1;5(8):e2227109. doi: 10.1001/jamanetworkopen.2022.27109.
7
Automated medical chart review for breast cancer outcomes research: a novel natural language processing extraction system.自动化医疗图表审查在乳腺癌结局研究中的应用:一种新颖的自然语言处理提取系统。
BMC Med Res Methodol. 2022 May 12;22(1):136. doi: 10.1186/s12874-022-01583-z.
8
Machine learning to parse breast pathology reports in Chinese.基于机器学习的中文乳腺病理报告解析
Breast Cancer Res Treat. 2018 Jun;169(2):243-250. doi: 10.1007/s10549-018-4668-3. Epub 2018 Jan 29.
9
Identification of pancreatic cancer risk factors from clinical notes using natural language processing.利用自然语言处理从临床记录中识别胰腺癌风险因素。
Pancreatology. 2024 Jun;24(4):572-578. doi: 10.1016/j.pan.2024.03.016. Epub 2024 Mar 26.
10
Extracting comprehensive clinical information for breast cancer using deep learning methods.利用深度学习方法提取乳腺癌全面临床信息。
Int J Med Inform. 2019 Dec;132:103985. doi: 10.1016/j.ijmedinf.2019.103985. Epub 2019 Oct 2.

引用本文的文献

1
Open-Source Hybrid Large Language Model Integrated System for Extraction of Breast Cancer Treatment Pathway From Free-Text Clinical Notes.用于从自由文本临床记录中提取乳腺癌治疗路径的开源混合大语言模型集成系统
JCO Clin Cancer Inform. 2025 Jun;9:e2500002. doi: 10.1200/CCI-25-00002. Epub 2025 Jun 27.