通过电子健康记录加速检查点抑制剂诱导的结肠炎病例的管理。

Accelerated curation of checkpoint inhibitor-induced colitis cases from electronic health records.

作者信息

Rahman Protiva, Ye Cheng, Mittendorf Kathleen F, Lenoue-Newton Michele, Micheel Christine, Wolber Jan, Osterman Travis, Fabbri Daniel

机构信息

Biomedical Informatics, Vanderbilt University Medical Center, Nashville, Tennessee, USA.

Vanderbilt Ingram Cancer Center, Vanderbilt University Medical Center, Nashville, Tennessee, USA.

出版信息

JAMIA Open. 2023 Apr 1;6(1):ooad017. doi: 10.1093/jamiaopen/ooad017. eCollection 2023 Apr.

DOI:10.1093/jamiaopen/ooad017

PMID:37012912

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC10066800/

Abstract

OBJECTIVE

Automatically identifying patients at risk of immune checkpoint inhibitor (ICI)-induced colitis allows physicians to improve patientcare. However, predictive models require training data curated from electronic health records (EHR). Our objective is to automatically identify notes documenting ICI-colitis cases to accelerate data curation.

MATERIALS AND METHODS

We present a data pipeline to automatically identify ICI-colitis from EHR notes, accelerating chart review. The pipeline relies on BERT, a state-of-the-art natural language processing (NLP) model. The first stage of the pipeline segments long notes using keywords identified through a logistic classifier and applies BERT to identify ICI-colitis notes. The next stage uses a second BERT model tuned to identify false positive notes and remove notes that were likely positive for mentioning colitis as a side-effect. The final stage further accelerates curation by highlighting the colitis-relevant portions of notes. Specifically, we use BERT's attention scores to find high-density regions describing colitis.

RESULTS

The overall pipeline identified colitis notes with 84% precision and reduced the curator note review load by 75%. The segment BERT classifier had a high recall of 0.98, which is crucial to identify the low incidence (<10%) of colitis.

DISCUSSION

Curation from EHR notes is a burdensome task, especially when the curation topic is complicated. Methods described in this work are not only useful for ICI colitis but can also be adapted for other domains.

CONCLUSION

Our extraction pipeline reduces manual note review load and makes EHR data more accessible for research.

摘要

目的

自动识别有免疫检查点抑制剂（ICI）诱导性结肠炎风险的患者，有助于医生改善患者护理。然而，预测模型需要从电子健康记录（EHR）中整理出的训练数据。我们的目标是自动识别记录ICI结肠炎病例的笔记，以加速数据整理。

材料与方法

我们提出了一种数据管道，用于从EHR笔记中自动识别ICI结肠炎，从而加速病历审查。该管道依赖于BERT，这是一种先进的自然语言处理（NLP）模型。管道的第一阶段使用通过逻辑分类器识别的关键词对长笔记进行分段，并应用BERT来识别ICI结肠炎笔记。下一阶段使用第二个经过调整的BERT模型来识别假阳性笔记，并删除那些可能因提及结肠炎作为副作用而呈阳性的笔记。最后阶段通过突出显示笔记中与结肠炎相关的部分，进一步加速整理过程。具体而言，我们使用BERT的注意力分数来找到描述结肠炎的高密度区域。