使用自然语言处理技术在计算机断层扫描报告中自动识别乳腺癌复发情况

Automated Identification of Breast Cancer Relapse in Computed Tomography Reports Using Natural Language Processing.

作者信息

Lee Jaimie J, Zepeda Andres, Arbour Gregory, Isaac Kathryn V, Ng Raymond T, Nichol Alan M

机构信息

Department of Radiation Oncology, BC Cancer, Vancouver, BC, Canada.

Department of Surgery, University of British Columbia, Vancouver, BC, Canada.

出版信息

JCO Clin Cancer Inform. 2024 Dec;8:e2400107. doi: 10.1200/CCI.24.00107. Epub 2024 Dec 20.

DOI:10.1200/CCI.24.00107

PMID:39705642

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC11670918/

Abstract

PURPOSE

Breast cancer relapses are rarely collected by cancer registries because of logistical and financial constraints. Hence, we investigated natural language processing (NLP), enhanced with state-of-the-art deep learning transformer tools and large language models, to automate relapse identification in the text of computed tomography (CT) reports.

METHODS

We analyzed follow-up CT reports from patients diagnosed with breast cancer between January 1, 2005, and December 31, 2014. The reports were curated and annotated for the presence or absence of local, regional, and distant breast cancer relapses. We performed 10-fold cross-validation to evaluate models identifying different types of relapses in CT reports. Model performance was assessed with classification metrics, reported with 95% confidence intervals.

RESULTS

In our data set of 1,445 CT reports, 799 (55.3%) described any relapse, 72 (5.0%) local relapses, 97 (6.7%) regional relapses, and 743 (51.4%) distant relapses. The any-relapse model achieved an accuracy of 89.6% (87.8-91.1), with a sensitivity of 93.2% (91.4-94.9) and a specificity of 84.2% (80.9-87.1). The local relapse model achieved an accuracy of 94.6% (93.3-95.7), a sensitivity of 44.4% (32.8-56.3), and a specificity of 97.2% (96.2-98.0). The regional relapse model showed an accuracy of 93.6% (92.3-94.9), a sensitivity of 70.1% (60.0-79.1), and a specificity of 95.3% (94.2-96.5). Finally, the distant relapse model demonstrated an accuracy of 88.1% (86.2-89.7), a sensitivity of 91.8% (89.9-93.8), and a specificity of 83.7% (80.5-86.4).

CONCLUSION

We developed NLP models to identify local, regional, and distant breast cancer relapses from CT reports. Automating the identification of breast cancer relapses can enhance data collection about patient outcomes.

摘要

目的

由于后勤和资金限制，癌症登记处很少收集乳腺癌复发数据。因此，我们研究了自然语言处理（NLP）技术，利用最先进的深度学习变压器工具和大语言模型进行增强，以自动识别计算机断层扫描（CT）报告文本中的复发情况。

方法

我们分析了2005年1月1日至2014年12月31日期间被诊断为乳腺癌的患者的随访CT报告。对报告进行整理并标注是否存在局部、区域和远处乳腺癌复发情况。我们进行了10折交叉验证，以评估识别CT报告中不同类型复发的模型。使用分类指标评估模型性能，并报告95%置信区间。

结果

在我们的1445份CT报告数据集中，799份（55.3%）描述了任何复发情况，72份（5.0%）为局部复发，97份（6.7%）为区域复发，743份（51.4%）为远处复发。任何复发模型的准确率为89.6%（87.8 - 91.1），灵敏度为93.2%（91.4 - 94.9），特异度为84.2%（80.9 - 87.1）。局部复发模型的准确率为94.6%（93.3 - 95.7），灵敏度为44.4%（32.8 - 56.3），特异度为97.2%（96.2 - 98.0）。区域复发模型的准确率为93.6%（92.3 - 94.9），灵敏度为70.1%（60.0 - 79.1），特异度为95.3%（94.2 - 96.5）。最后，远处复发模型的准确率为88.1%（86.2 - 89.7），灵敏度为91.8%（89.9 - 93.8），特异度为83.7%（80.5 - 86.4）。

结论

我们开发了NLP模型，用于从CT报告中识别局部、区域和远处乳腺癌复发情况。自动识别乳腺癌复发情况可以加强关于患者预后的数据收集。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e55f/11670918/5d01fe215de6/cci-8-e2400107-g001.jpg

相似文献

Automated Identification of Breast Cancer Relapse in Computed Tomography Reports Using Natural Language Processing.使用自然语言处理技术在计算机断层扫描报告中自动识别乳腺癌复发情况

JCO Clin Cancer Inform. 2024 Dec;8:e2400107. doi: 10.1200/CCI.24.00107. Epub 2024 Dec 20.

Natural Language Processing Approaches to Detect the Timeline of Metastatic Recurrence of Breast Cancer.用于检测乳腺癌转移复发时间线的自然语言处理方法

JCO Clin Cancer Inform. 2019 Oct;3:1-12. doi: 10.1200/CCI.19.00034.

TECRR: a benchmark dataset of radiological reports for BI-RADS classification with machine learning, deep learning, and large language model baselines.TECRR：一个基于机器学习、深度学习和大语言模型基线的用于 BI-RADS 分类的放射学报告基准数据集。

BMC Med Inform Decis Mak. 2024 Oct 24;24(1):310. doi: 10.1186/s12911-024-02717-7.

Prediction of local relapse and distant metastasis in patients with definitive chemoradiotherapy-treated cervical cancer by deep learning from [F]-fluorodeoxyglucose positron emission tomography/computed tomography.深度学习在[F]-氟代脱氧葡萄糖正电子发射断层扫描/计算机断层扫描在根治性放化疗治疗宫颈癌患者中的局部复发和远处转移预测。

Eur Radiol. 2019 Dec;29(12):6741-6749. doi: 10.1007/s00330-019-06265-x. Epub 2019 May 27.

Transformer versus traditional natural language processing: how much data is enough for automated radiology report classification?Transformer 与传统自然语言处理：自动化放射科报告分类需要多少数据？

Br J Radiol. 2023 Sep;96(1149):20220769. doi: 10.1259/bjr.20220769. Epub 2023 May 25.

Comparison of an Ensemble of Machine Learning Models and the BERT Language Model for Analysis of Text Descriptions of Brain CT Reports to Determine the Presence of Intracranial Hemorrhage.基于机器学习模型集成与 BERT 语言模型的脑 CT 报告文本描述分析用于判断颅内出血的比较研究

Sovrem Tekhnologii Med. 2024;16(1):27-34. doi: 10.17691/stm2024.16.1.03. Epub 2024 Feb 28.

Development and Validation of a Model to Identify Critical Brain Injuries Using Natural Language Processing of Text Computed Tomography Reports.利用文本计算机断层扫描报告的自然语言处理开发和验证一种识别关键脑损伤的模型。

JAMA Netw Open. 2022 Aug 1;5(8):e2227109. doi: 10.1001/jamanetworkopen.2022.27109.

Deep-Transfer-Learning-Based Natural Language Processing of Serial Free-Text Computed Tomography Reports for Predicting Survival of Patients With Pancreatic Cancer.基于深度迁移学习的胰腺癌细胞患者生存预测的连续自由文本 CT 报告自然语言处理。

JCO Clin Cancer Inform. 2024 Aug;8:e2400021. doi: 10.1200/CCI.24.00021.

Natural Language Processing for the Identification of Incidental Lung Nodules in Computed Tomography Reports: A Quality Control Tool.自然语言处理在计算机断层扫描报告中识别偶然肺结节的应用：一种质量控制工具。

JCO Glob Oncol. 2023 Sep;9:e2300191. doi: 10.1200/GO.23.00191.

Automated extraction of information of lung cancer staging from unstructured reports of PET-CT interpretation: natural language processing with deep-learning.从 PET-CT 解读的非结构化报告中自动提取肺癌分期信息：基于深度学习的自然语言处理。

BMC Med Inform Decis Mak. 2022 Sep 1;22(1):229. doi: 10.1186/s12911-022-01975-7.

引用本文的文献

Natural language processing for local, regional, and distant breast cancer relapse identification in pathology reports.用于在病理报告中识别局部、区域和远处乳腺癌复发的自然语言处理技术。

Breast Cancer Res Treat. 2025 Sep 2. doi: 10.1007/s10549-025-07801-8.

Deep Learning Model for Natural Language to Assess Effectiveness of Patients With Non-Muscle Invasive Bladder Cancer Receiving Intravesical Bacillus Calmette-Guérin Therapy.用于评估非肌层浸润性膀胱癌患者接受膀胱内卡介苗治疗有效性的自然语言深度学习模型。

JCO Clin Cancer Inform. 2025 Jun;9:e2400249. doi: 10.1200/CCI-24-00249. Epub 2025 Jun 27.

Artificial Intelligence in Relation to Accurate Information and Tasks in Gynecologic Oncology and Clinical Medicine-Dunning-Kruger Effects and Ultracrepidarianism.人工智能与妇科肿瘤学和临床医学中的准确信息及任务——邓宁-克鲁格效应和不懂装懂。

Diagnostics (Basel). 2025 Mar 15;15(6):735. doi: 10.3390/diagnostics15060735.

本文引用的文献

Evaluation and mitigation of the limitations of large language models in clinical decision-making.评估和缓解大型语言模型在临床决策中的局限性。

Nat Med. 2024 Sep;30(9):2613-2622. doi: 10.1038/s41591-024-03097-1. Epub 2024 Jul 4.

Efficient healthcare with large language models: optimizing clinical workflow and enhancing patient care.大语言模型在医疗保健中的高效应用：优化临床工作流程，提升患者护理水平。

J Am Med Inform Assoc. 2024 May 20;31(6):1436-1440. doi: 10.1093/jamia/ocad258.

ChatGPT in healthcare: A taxonomy and systematic review.ChatGPT 在医疗保健中的应用：分类法与系统综述。

Comput Methods Programs Biomed. 2024 Mar;245:108013. doi: 10.1016/j.cmpb.2024.108013. Epub 2024 Jan 15.

Cancer statistics, 2024.2024年癌症统计数据。

CA Cancer J Clin. 2024 Jan-Feb;74(1):12-49. doi: 10.3322/caac.21820. Epub 2024 Jan 17.

Automatic Detection of Distant Metastasis Mentions in Radiology Reports in Spanish.自动检测西班牙语放射学报告中的远处转移提及。

JCO Clin Cancer Inform. 2024 Jan;8:e2300130. doi: 10.1200/CCI.23.00130.

A study of generative large language model for medical research and healthcare.一项关于用于医学研究和医疗保健的生成式大语言模型的研究。

NPJ Digit Med. 2023 Nov 16;6(1):210. doi: 10.1038/s41746-023-00958-w.

Extracting cancer concepts from clinical notes using natural language processing: a systematic review.使用自然语言处理从临床笔记中提取癌症概念：系统评价。

BMC Bioinformatics. 2023 Oct 29;24(1):405. doi: 10.1186/s12859-023-05480-0.

Revolutionizing healthcare: the role of artificial intelligence in clinical practice.人工智能在临床实践中的应用：医疗保健的革命。

BMC Med Educ. 2023 Sep 22;23(1):689. doi: 10.1186/s12909-023-04698-z.

Quality indicators: completeness, validity and timeliness of cancer registry data contributing to the European Cancer Information System.质量指标：为欧洲癌症信息系统提供数据的癌症登记数据的完整性、有效性和及时性。

Front Oncol. 2023 Jul 28;13:1219128. doi: 10.3389/fonc.2023.1219128. eCollection 2023.

Evolution of Breast Cancer Recurrence Risk Prediction: A Systematic Review of Statistical and Machine Learning-Based Models.乳腺癌复发风险预测的演变：基于统计和机器学习模型的系统评价。

JCO Clin Cancer Inform. 2023 Aug;7:e2300049. doi: 10.1200/CCI.23.00049.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

使用自然语言处理技术在计算机断层扫描报告中自动识别乳腺癌复发情况

Automated Identification of Breast Cancer Relapse in Computed Tomography Reports Using Natural Language Processing.

作者信息

机构信息

出版信息

PURPOSE

METHODS

RESULTS

CONCLUSION

目的

方法

结果

结论

相似文献

引用本文的文献

本文引用的文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献