Suppr超能文献

自然语言处理自动提取接受放疗的患者病历中食管炎的存在和严重程度

Natural Language Processing to Automatically Extract the Presence and Severity of Esophagitis in Notes of Patients Undergoing Radiotherapy.

机构信息

Artificial Intelligence in Medicine (AIM) Program, Mass General Brigham, Harvard Medical School, Boston, MA.

Department of Radiation Oncology, Brigham and Women's Hospital/Dana-Farber Cancer Institute, Boston, MA.

出版信息

JCO Clin Cancer Inform. 2023 Jul;7:e2300048. doi: 10.1200/CCI.23.00048.

Abstract

PURPOSE

Radiotherapy (RT) toxicities can impair survival and quality of life, yet remain understudied. Real-world evidence holds potential to improve our understanding of toxicities, but toxicity information is often only in clinical notes. We developed natural language processing (NLP) models to identify the presence and severity of esophagitis from notes of patients treated with thoracic RT.

METHODS

Our corpus consisted of a gold-labeled data set of 1,524 clinical notes from 124 patients with lung cancer treated with RT, manually annotated for Common Terminology Criteria for Adverse Events (CTCAE) v5.0 esophagitis grade, and a silver-labeled data set of 2,420 notes from 1,832 patients from whom toxicity grades had been collected as structured data during clinical care. We fine-tuned statistical and pretrained Bidirectional Encoder Representations from Transformers-based models for three esophagitis classification tasks: task 1, no esophagitis versus grade 1-3; task 2, grade ≤1 versus >1; and task 3, no esophagitis versus grade 1 versus grade 2-3. Transferability was tested on 345 notes from patients with esophageal cancer undergoing RT.

RESULTS

Fine-tuning of PubMedBERT yielded the best performance. The best macro-F1 was 0.92, 0.82, and 0.74 for tasks 1, 2, and 3, respectively. Selecting the most informative note sections during fine-tuning improved macro-F1 by ≥2% for all tasks. Silver-labeled data improved the macro-F1 by ≥3% across all tasks. For the esophageal cancer notes, the best macro-F1 was 0.73, 0.74, and 0.65 for tasks 1, 2, and 3, respectively, without additional fine-tuning.

CONCLUSION

To our knowledge, this is the first effort to automatically extract esophagitis toxicity severity according to CTCAE guidelines from clinical notes. This provides proof of concept for NLP-based automated detailed toxicity monitoring in expanded domains.

摘要

目的

放射治疗(RT)的毒性作用会影响生存和生活质量,但目前对其研究仍不够充分。真实世界的数据有可能提高我们对毒性作用的认识,但毒性信息通常只存在于临床记录中。我们开发了自然语言处理(NLP)模型,以从接受胸部 RT 治疗的患者的记录中识别食管炎的存在和严重程度。

方法

我们的语料库由 124 名肺癌患者的 1524 份临床记录组成,这些记录来自接受 RT 治疗的患者,这些记录经过了人工标注,以确定美国国家癌症研究所不良事件通用术语标准(CTCAE)v5.0 食管炎分级,还有一个 Silver 标签数据集,其中包含 1832 名患者的 2420 份记录,这些记录在临床护理过程中作为结构化数据收集了毒性分级。我们针对三个食管炎分类任务对基于统计和预训练的双向编码器表示的 Transformer 模型进行了微调:任务 1,无食管炎与 1-3 级;任务 2,≤1 级与>1 级;任务 3,无食管炎与 1 级与 2-3 级。在接受 RT 的食管癌患者的 345 份记录上测试了可转移性。

结果

对 PubMedBERT 的微调产生了最佳性能。任务 1、2 和 3 的最佳宏 F1 分别为 0.92、0.82 和 0.74。在所有任务中,在微调过程中选择最具信息量的记录部分都将宏 F1 提高了≥2%。Silver 标签数据在所有任务中都提高了宏 F1≥3%。对于食管癌记录,任务 1、2 和 3 的最佳宏 F1 分别为 0.73、0.74 和 0.65,无需进一步微调。

结论

据我们所知,这是首次根据 CTCAE 指南从临床记录中自动提取食管炎毒性严重程度的尝试。这为在更广泛的领域中基于 NLP 的自动详细毒性监测提供了概念验证。

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验