• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

用于在病理报告中识别局部、区域和远处乳腺癌复发的自然语言处理技术。

Natural language processing for local, regional, and distant breast cancer relapse identification in pathology reports.

作者信息

Lee Jaimie J, Jettinghoff William, Arbour Gregory, Zepeda Andres, Isaac Kathryn V, Ng Raymond T, Nichol Alan M

机构信息

Department of Radiation Oncology, BC Cancer, Vancouver, BC, Canada.

Department of Surgery, University of British Columbia, Vancouver, BC, Canada.

出版信息

Breast Cancer Res Treat. 2025 Sep 2. doi: 10.1007/s10549-025-07801-8.

DOI:10.1007/s10549-025-07801-8
PMID:40897953
Abstract

PURPOSE

Cancer registries rarely track breast cancer relapse due to the resource-intensive nature of manual chart review. To address this gap, we developed natural language processing (NLP) models to automate the identification of breast cancer relapse in pathology reports.

METHODS

We collected pathology reports from patients diagnosed with breast cancer between January 1, 2005, and December 31, 2014, in British Columbia, Canada, and manually annotated each for the presence or absence of local, regional, distant, and any breast cancer relapses. With these reports, we fine-tuned large language models to classify pathology reports.

RESULTS

The corpus contained 1,888 pathology reports from a cohort of 993 breast cancer patients. Of these reports, 673 (35.6%) described local, 296 (15.7%) regional, and 654 (34.6%) distant relapses. In addition, 1,510 (80.0%) described at least one of any relapse type. The median time from diagnosis to first relapse was 7.3 years (range 0.2-18.2). All models demonstrated excellent performance. The local-relapse model performed particularly well, with > 93% accuracy, sensitivity, specificity, and 0.98 area under the receiver operating characteristic curve (AUC) score.

CONCLUSION

We developed NLP models to detect breast cancer relapses from pathology reports with excellent accuracy, sensitivity, specificity, and AUC. NLP may facilitate more efficient and accurate collection of breast cancer outcomes data from clinical reports.

摘要

目的

由于人工查阅病历资源消耗大,癌症登记处很少追踪乳腺癌复发情况。为填补这一空白,我们开发了自然语言处理(NLP)模型,以自动识别病理报告中的乳腺癌复发情况。

方法

我们收集了2005年1月1日至2014年12月31日期间在加拿大不列颠哥伦比亚省被诊断为乳腺癌的患者的病理报告,并人工标注每份报告中是否存在局部、区域、远处及任何乳腺癌复发情况。利用这些报告,我们对大语言模型进行微调以对病理报告进行分类。

结果

语料库包含来自993名乳腺癌患者队列的1888份病理报告。在这些报告中,673份(35.6%)描述了局部复发,296份(15.7%)描述了区域复发,654份(34.6%)描述了远处复发。此外,1510份(80.0%)描述了至少一种复发类型。从诊断到首次复发的中位时间为7.3年(范围0.2 - 18.2年)。所有模型均表现出优异的性能。局部复发模型表现尤为出色,准确率、灵敏度、特异性均超过93%,受试者操作特征曲线(AUC)评分达0.98。

结论

我们开发了NLP模型,能以优异的准确率、灵敏度、特异性和AUC从病理报告中检测乳腺癌复发情况。NLP可能有助于从临床报告中更高效、准确地收集乳腺癌结局数据。

相似文献

1
Natural language processing for local, regional, and distant breast cancer relapse identification in pathology reports.用于在病理报告中识别局部、区域和远处乳腺癌复发的自然语言处理技术。
Breast Cancer Res Treat. 2025 Sep 2. doi: 10.1007/s10549-025-07801-8.
2
Development of a Natural Language Processing Model for Extracting Kidney Biopsy Pathology Diagnoses.用于提取肾活检病理诊断的自然语言处理模型的开发
Kidney Med. 2025 Jun 14;7(8):101047. doi: 10.1016/j.xkme.2025.101047. eCollection 2025 Aug.
3
Variation within and between digital pathology and light microscopy for the diagnosis of histopathology slides: blinded crossover comparison study.数字病理学与光学显微镜检查在组织病理学切片诊断中的内部及相互间差异:双盲交叉对比研究
Health Technol Assess. 2025 Jul;29(30):1-75. doi: 10.3310/SPLK4325.
4
CSF tau and the CSF tau/ABeta ratio for the diagnosis of Alzheimer's disease dementia and other dementias in people with mild cognitive impairment (MCI).脑脊液tau蛋白及脑脊液tau蛋白与β淀粉样蛋白比值在轻度认知障碍(MCI)患者中用于诊断阿尔茨海默病性痴呆及其他痴呆。
Cochrane Database Syst Rev. 2017 Mar 22;3(3):CD010803. doi: 10.1002/14651858.CD010803.pub2.
5
Prescription of Controlled Substances: Benefits and Risks管制药品的处方:益处与风险
6
Plasma and cerebrospinal fluid amyloid beta for the diagnosis of Alzheimer's disease dementia and other dementias in people with mild cognitive impairment (MCI).血浆和脑脊液β淀粉样蛋白用于诊断轻度认知障碍(MCI)患者的阿尔茨海默病性痴呆及其他痴呆。
Cochrane Database Syst Rev. 2014 Jun 10;2014(6):CD008782. doi: 10.1002/14651858.CD008782.pub4.
7
Vitamin D for the treatment of inflammatory bowel disease.维生素 D 治疗炎症性肠病。
Cochrane Database Syst Rev. 2023 Oct 2;10(10):CD011806. doi: 10.1002/14651858.CD011806.pub2.
8
Extracting lung cancer staging descriptors from pathology reports: A generative language model approach.从病理报告中提取肺癌分期描述符:一种生成式语言模型方法。
J Biomed Inform. 2024 Sep;157:104720. doi: 10.1016/j.jbi.2024.104720. Epub 2024 Sep 2.
9
Signs and symptoms to determine if a patient presenting in primary care or hospital outpatient settings has COVID-19.在基层医疗机构或医院门诊环境中,如果患者出现以下症状和体征,可判断其是否患有 COVID-19。
Cochrane Database Syst Rev. 2022 May 20;5(5):CD013665. doi: 10.1002/14651858.CD013665.pub3.
10
Generalized Anxiety Disorder 7-item (GAD-7) and 2-item (GAD-2) scales for detecting anxiety disorders in adults.用于检测成人焦虑症的广泛性焦虑障碍7项(GAD - 7)和2项(GAD - 2)量表。
Cochrane Database Syst Rev. 2025 Mar 25;3(3):CD015455. doi: 10.1002/14651858.CD015455.

本文引用的文献

1
Automated Identification of Breast Cancer Relapse in Computed Tomography Reports Using Natural Language Processing.使用自然语言处理技术在计算机断层扫描报告中自动识别乳腺癌复发情况
JCO Clin Cancer Inform. 2024 Dec;8:e2400107. doi: 10.1200/CCI.24.00107. Epub 2024 Dec 20.
2
Imaging for local recurrence of breast cancer.乳腺癌局部复发的影像学评估。
J Cancer Res Clin Oncol. 2024 Apr 17;150(4):200. doi: 10.1007/s00432-024-05709-2.
3
Cancer statistics, 2024.2024年癌症统计数据。
CA Cancer J Clin. 2024 Jan-Feb;74(1):12-49. doi: 10.3322/caac.21820. Epub 2024 Jan 17.
4
Quality indicators: completeness, validity and timeliness of cancer registry data contributing to the European Cancer Information System.质量指标:为欧洲癌症信息系统提供数据的癌症登记数据的完整性、有效性和及时性。
Front Oncol. 2023 Jul 28;13:1219128. doi: 10.3389/fonc.2023.1219128. eCollection 2023.
5
Assessing Mode of Recurrence in Breast Cancer to Identify an Optimised Follow-Up Pathway: 10-Year Institutional Review.评估乳腺癌的复发模式以确定优化的随访途径:10 年机构审查。
Ann Surg Oncol. 2023 Oct;30(10):6117-6124. doi: 10.1245/s10434-023-13885-7. Epub 2023 Jul 21.
6
Electronic health record data quality variability across a multistate clinical research network.多州临床研究网络中电子健康记录数据质量的变异性
J Clin Transl Sci. 2023 May 15;7(1):e130. doi: 10.1017/cts.2023.548. eCollection 2023.
7
Assessment of Natural Language Processing of Electronic Health Records to Measure Goals-of-Care Discussions as a Clinical Trial Outcome.评估电子健康记录中的自然语言处理以衡量作为临床试验结局的照护目标讨论。
JAMA Netw Open. 2023 Mar 1;6(3):e231204. doi: 10.1001/jamanetworkopen.2023.1204.
8
Natural Language Processing in Electronic Health Records in relation to healthcare decision-making: A systematic review.电子健康记录中与医疗决策相关的自然语言处理:一项系统综述。
Comput Biol Med. 2023 Mar;155:106649. doi: 10.1016/j.compbiomed.2023.106649. Epub 2023 Feb 10.
9
Comparing the Efficiency of Imaging Modalities in Detection of Recurrent Breast Cancer.比较成像模态在检测复发性乳腺癌中的效率。
Eur J Breast Health. 2023 Jan 1;19(1):85-91. doi: 10.4274/ejbh.galenos.2022.2022-10-1. eCollection 2023 Jan.
10
Breast Cancer Statistics, 2022.2022 年乳腺癌统计数据。
CA Cancer J Clin. 2022 Nov;72(6):524-541. doi: 10.3322/caac.21754. Epub 2022 Oct 3.