自然语言处理自动提取接受放疗的患者病历中食管炎的存在和严重程度

Natural Language Processing to Automatically Extract the Presence and Severity of Esophagitis in Notes of Patients Undergoing Radiotherapy.

机构信息

Artificial Intelligence in Medicine (AIM) Program, Mass General Brigham, Harvard Medical School, Boston, MA.

Department of Radiation Oncology, Brigham and Women's Hospital/Dana-Farber Cancer Institute, Boston, MA.

出版信息

JCO Clin Cancer Inform. 2023 Jul;7:e2300048. doi: 10.1200/CCI.23.00048.

DOI:10.1200/CCI.23.00048

PMID:37506330

Abstract

PURPOSE

Radiotherapy (RT) toxicities can impair survival and quality of life, yet remain understudied. Real-world evidence holds potential to improve our understanding of toxicities, but toxicity information is often only in clinical notes. We developed natural language processing (NLP) models to identify the presence and severity of esophagitis from notes of patients treated with thoracic RT.

METHODS

Our corpus consisted of a gold-labeled data set of 1,524 clinical notes from 124 patients with lung cancer treated with RT, manually annotated for Common Terminology Criteria for Adverse Events (CTCAE) v5.0 esophagitis grade, and a silver-labeled data set of 2,420 notes from 1,832 patients from whom toxicity grades had been collected as structured data during clinical care. We fine-tuned statistical and pretrained Bidirectional Encoder Representations from Transformers-based models for three esophagitis classification tasks: task 1, no esophagitis versus grade 1-3; task 2, grade ≤1 versus >1; and task 3, no esophagitis versus grade 1 versus grade 2-3. Transferability was tested on 345 notes from patients with esophageal cancer undergoing RT.

RESULTS

Fine-tuning of PubMedBERT yielded the best performance. The best macro-F1 was 0.92, 0.82, and 0.74 for tasks 1, 2, and 3, respectively. Selecting the most informative note sections during fine-tuning improved macro-F1 by ≥2% for all tasks. Silver-labeled data improved the macro-F1 by ≥3% across all tasks. For the esophageal cancer notes, the best macro-F1 was 0.73, 0.74, and 0.65 for tasks 1, 2, and 3, respectively, without additional fine-tuning.

CONCLUSION

To our knowledge, this is the first effort to automatically extract esophagitis toxicity severity according to CTCAE guidelines from clinical notes. This provides proof of concept for NLP-based automated detailed toxicity monitoring in expanded domains.

摘要

目的

放射治疗（RT）的毒性作用会影响生存和生活质量，但目前对其研究仍不够充分。真实世界的数据有可能提高我们对毒性作用的认识，但毒性信息通常只存在于临床记录中。我们开发了自然语言处理（NLP）模型，以从接受胸部 RT 治疗的患者的记录中识别食管炎的存在和严重程度。

方法

我们的语料库由 124 名肺癌患者的 1524 份临床记录组成，这些记录来自接受 RT 治疗的患者，这些记录经过了人工标注，以确定美国国家癌症研究所不良事件通用术语标准（CTCAE）v5.0 食管炎分级，还有一个 Silver 标签数据集，其中包含 1832 名患者的 2420 份记录，这些记录在临床护理过程中作为结构化数据收集了毒性分级。我们针对三个食管炎分类任务对基于统计和预训练的双向编码器表示的 Transformer 模型进行了微调：任务 1，无食管炎与 1-3 级；任务 2，≤1 级与>1 级；任务 3，无食管炎与 1 级与 2-3 级。在接受 RT 的食管癌患者的 345 份记录上测试了可转移性。

结果

对 PubMedBERT 的微调产生了最佳性能。任务 1、2 和 3 的最佳宏 F1 分别为 0.92、0.82 和 0.74。在所有任务中，在微调过程中选择最具信息量的记录部分都将宏 F1 提高了≥2%。Silver 标签数据在所有任务中都提高了宏 F1≥3%。对于食管癌记录，任务 1、2 和 3 的最佳宏 F1 分别为 0.73、0.74 和 0.65，无需进一步微调。

结论

据我们所知，这是首次根据 CTCAE 指南从临床记录中自动提取食管炎毒性严重程度的尝试。这为在更广泛的领域中基于 NLP 的自动详细毒性监测提供了概念验证。

相似文献

Natural Language Processing to Automatically Extract the Presence and Severity of Esophagitis in Notes of Patients Undergoing Radiotherapy.自然语言处理自动提取接受放疗的患者病历中食管炎的存在和严重程度

JCO Clin Cancer Inform. 2023 Jul;7:e2300048. doi: 10.1200/CCI.23.00048.

Extracting comprehensive clinical information for breast cancer using deep learning methods.利用深度学习方法提取乳腺癌全面临床信息。

Int J Med Inform. 2019 Dec;132:103985. doi: 10.1016/j.ijmedinf.2019.103985. Epub 2019 Oct 2.

Classifying social determinants of health from unstructured electronic health records using deep learning-based natural language processing.利用基于深度学习的自然语言处理技术从非结构化电子健康记录中分类社会健康决定因素。

J Biomed Inform. 2022 Mar;127:103984. doi: 10.1016/j.jbi.2021.103984. Epub 2022 Jan 7.

Transformers-sklearn: a toolkit for medical language understanding with transformer-based models.Transformer-sklearn：一个基于 Transformer 的模型的医学语言理解工具包。

BMC Med Inform Decis Mak. 2021 Jul 30;21(Suppl 2):90. doi: 10.1186/s12911-021-01459-0.

An End-to-End Natural Language Processing System for Automatically Extracting Radiation Therapy Events From Clinical Texts.用于从临床文本中自动提取放射治疗事件的端到端自然语言处理系统。

Int J Radiat Oncol Biol Phys. 2023 Sep 1;117(1):262-273. doi: 10.1016/j.ijrobp.2023.03.055. Epub 2023 Mar 27.

Predictors of severe esophagitis include use of concurrent chemotherapy, but not the length of irradiated esophagus: a multivariate analysis of patients with lung cancer treated with nonoperative therapy.严重食管炎的预测因素包括同时进行化疗，但不包括照射食管的长度：对接受非手术治疗的肺癌患者的多变量分析。

Int J Radiat Oncol Biol Phys. 2000 Oct 1;48(3):689-96. doi: 10.1016/s0360-3016(00)00699-4.

A Question-and-Answer System to Extract Data From Free-Text Oncological Pathology Reports (CancerBERT Network): Development Study.从自由文本肿瘤病理学报告（CancerBERT 网络）中提取数据的问答系统：开发研究。

J Med Internet Res. 2022 Mar 23;24(3):e27210. doi: 10.2196/27210.

When BERT meets Bilbo: a learning curve analysis of pretrained language model on disease classification.当 BERT 遇见比尔博：预训练语言模型在疾病分类上的学习曲线分析。

BMC Med Inform Decis Mak. 2022 Apr 5;21(Suppl 9):377. doi: 10.1186/s12911-022-01829-2.

Ensembles of natural language processing systems for portable phenotyping solutions.用于便携表型解决方案的自然语言处理系统集合。

J Biomed Inform. 2019 Dec;100:103318. doi: 10.1016/j.jbi.2019.103318. Epub 2019 Oct 23.

Natural language processing for abstraction of cancer treatment toxicities: accuracy versus human experts.用于提取癌症治疗毒性的自然语言处理：准确性与人类专家对比

JAMIA Open. 2020 Dec 5;3(4):513-517. doi: 10.1093/jamiaopen/ooaa064. eCollection 2020 Dec.

引用本文的文献

Large Language Models for Adverse Drug Events: A Clinical Perspective.用于药物不良事件的大语言模型：临床视角

J Clin Med. 2025 Aug 4;14(15):5490. doi: 10.3390/jcm14155490.

Informatics at the Frontier of Cancer Research.癌症研究前沿的信息学

Cancer Res. 2025 Aug 15;85(16):2967-2986. doi: 10.1158/0008-5472.CAN-24-2829.

A Narrative Review on the Application of Large Language Models to Support Cancer Care and Research.关于应用大语言模型支持癌症护理与研究的叙述性综述。

Yearb Med Inform. 2024 Aug;33(1):90-98. doi: 10.1055/s-0044-1800726. Epub 2025 Apr 8.

Large Language Model Applications for Health Information Extraction in Oncology: Scoping Review.用于肿瘤学健康信息提取的大语言模型应用：范围综述

JMIR Cancer. 2025 Mar 28;11:e65984. doi: 10.2196/65984.

An automatic pipeline for temporal monitoring of radiotherapy-induced toxicities in head and neck cancer patients.一种用于对头颈部癌患者放疗引起的毒性进行时间监测的自动化流程。

NPJ Precis Oncol. 2025 Feb 7;9(1):40. doi: 10.1038/s41698-025-00824-w.

Artificial intelligence research in radiation oncology: a practical guide for the clinician on concepts and methods.放射肿瘤学中的人工智能研究：临床医生关于概念和方法的实用指南。

BJR Open. 2024 Nov 13;6(1):tzae039. doi: 10.1093/bjro/tzae039. eCollection 2024 Jan.

Large Language Models in Biomedical and Health Informatics: A Review with Bibliometric Analysis.生物医学与健康信息学中的大语言模型：文献计量分析综述

J Healthc Inform Res. 2024 Sep 14;8(4):658-711. doi: 10.1007/s41666-024-00171-8. eCollection 2024 Dec.

Advancing the Collaboration Between Imaging and Radiation Oncology.推进影像医学与放射肿瘤学的协作。

Semin Radiat Oncol. 2024 Oct;34(4):402-417. doi: 10.1016/j.semradonc.2024.07.005.

Data Science Opportunities To Improve Radiotherapy Planning and Clinical Decision Making.数据科学在改善放射治疗计划和临床决策方面的机遇。

Semin Radiat Oncol. 2024 Oct;34(4):379-394. doi: 10.1016/j.semradonc.2024.07.012.

An innovative method to strengthen evidence for potential drug safety signals using Electronic Health Records.利用电子健康记录加强潜在药物安全信号证据的创新方法。

J Med Syst. 2024 May 16;48(1):51. doi: 10.1007/s10916-024-02070-2.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

自然语言处理自动提取接受放疗的患者病历中食管炎的存在和严重程度

Natural Language Processing to Automatically Extract the Presence and Severity of Esophagitis in Notes of Patients Undergoing Radiotherapy.

机构信息

出版信息

PURPOSE

METHODS

RESULTS

CONCLUSION

目的

方法

结果

结论

相似文献

引用本文的文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献

引用本文的文献