Li Rumeng, Hu Baotian, Liu Feifan, Liu Weisong, Cunningham Francesca, McManus David D, Yu Hong
College of Information and Computer Science, University of Massachusetts Amherst, Amherst, MA, United States.
Department of Computer Science, University of Massachusetts Lowell, Lowell, MA, United States.
JMIR Med Inform. 2019 Feb 8;7(1):e10788. doi: 10.2196/10788.
Bleeding events are common and critical and may cause significant morbidity and mortality. High incidences of bleeding events are associated with cardiovascular disease in patients on anticoagulant therapy. Prompt and accurate detection of bleeding events is essential to prevent serious consequences. As bleeding events are often described in clinical notes, automatic detection of bleeding events from electronic health record (EHR) notes may improve drug-safety surveillance and pharmacovigilance.
We aimed to develop a natural language processing (NLP) system to automatically classify whether an EHR note sentence contains a bleeding event.
We expert annotated 878 EHR notes (76,577 sentences and 562,630 word-tokens) to identify bleeding events at the sentence level. This annotated corpus was used to train and validate our NLP systems. We developed an innovative hybrid convolutional neural network (CNN) and long short-term memory (LSTM) autoencoder (HCLA) model that integrates a CNN architecture with a bidirectional LSTM (BiLSTM) autoencoder model to leverage large unlabeled EHR data.
HCLA achieved the best area under the receiver operating characteristic curve (0.957) and F1 score (0.938) to identify whether a sentence contains a bleeding event, thereby surpassing the strong baseline support vector machines and other CNN and autoencoder models.
By incorporating a supervised CNN model and a pretrained unsupervised BiLSTM autoencoder, the HCLA achieved high performance in detecting bleeding events.
出血事件常见且严重,可能导致显著的发病率和死亡率。在接受抗凝治疗的患者中,出血事件的高发生率与心血管疾病相关。及时准确地检测出血事件对于预防严重后果至关重要。由于出血事件常在临床记录中描述,从电子健康记录(EHR)笔记中自动检测出血事件可能会改善药物安全监测和药物警戒。
我们旨在开发一种自然语言处理(NLP)系统,以自动分类EHR笔记句子是否包含出血事件。
我们由专家对878份EHR笔记(76,577个句子和562,630个单词标记)进行注释,以在句子层面识别出血事件。这个注释语料库用于训练和验证我们的NLP系统。我们开发了一种创新的混合卷积神经网络(CNN)和长短期记忆(LSTM)自动编码器(HCLA)模型,该模型将CNN架构与双向LSTM(BiLSTM)自动编码器模型集成,以利用大量未标记的EHR数据。
HCLA在识别句子是否包含出血事件方面,在受试者工作特征曲线下面积(0.957)和F1分数(0.938)方面取得了最佳成绩,从而超越了强大的基线支持向量机以及其他CNN和自动编码器模型。
通过结合有监督的CNN模型和预训练的无监督BiLSTM自动编码器,HCLA在检测出血事件方面取得了高性能。