Worcester Polytechnic Institute, 100 Institute Rd, Worcester, MA, 01609, USA.
Drug Saf. 2019 Jan;42(1):113-122. doi: 10.1007/s40264-018-0765-9.
Adverse drug event (ADE) detection is a vital step towards effective pharmacovigilance and prevention of future incidents caused by potentially harmful ADEs. The electronic health records (EHRs) of patients in hospitals contain valuable information regarding ADEs and hence are an important source for detecting ADE signals. However, EHR texts tend to be noisy. Yet applying off-the-shelf tools for EHR text preprocessing jeopardizes the subsequent ADE detection performance, which depends on a well tokenized text input.
In this paper, we report our experience with the NLP Challenges for Detecting Medication and Adverse Drug Events from Electronic Health Records (MADE1.0), which aims to promote deep innovations on this subject. In particular, we have developed rule-based sentence and word tokenization techniques to deal with the noise in the EHR text.
We propose a detection methodology by adapting a three-layered, deep learning architecture of (1) recurrent neural network [bi-directional long short-term memory (Bi-LSTM)] for character-level word representation to encode the morphological features of the medical terminology, (2) Bi-LSTM for capturing the contextual information of each word within a sentence, and (3) conditional random fields for the final label prediction by also considering the surrounding words. We experiment with different word embedding methods commonly used in word-level classification tasks and demonstrate the impact of an integrated usage of both domain-specific and general-purpose pre-trained word embedding for detecting ADEs from EHRs.
Our system was ranked first for the named entity recognition task in the MADE1.0 challenge, with a micro-averaged F1-score of 0.8290 (official score).
Our results indicate that the integration of two widely used sequence labeling techniques that complement each other along with dual-level embedding (character level and word level) to represent words in the input layer results in a deep learning architecture that achieves excellent information extraction accuracy for EHR notes.
药物不良事件(ADE)检测是实现有效药物警戒和预防潜在有害 ADE 引起的未来事件的重要步骤。医院患者的电子健康记录(EHR)包含有关 ADE 的有价值信息,因此是检测 ADE 信号的重要来源。然而,EHR 文本往往存在噪音。然而,使用现成的 EHR 文本预处理工具会危及后续的 ADE 检测性能,而这取决于经过良好标记的文本输入。
本文报告了我们在从电子健康记录中检测药物和药物不良事件的自然语言处理挑战(MADE1.0)中的经验,该挑战旨在促进该主题的深度创新。特别是,我们开发了基于规则的句子和单词标记化技术来处理 EHR 文本中的噪音。
我们提出了一种检测方法,通过调整三层深度神经网络架构(1)用于字符级单词表示的递归神经网络[双向长短期记忆(Bi-LSTM)]来编码医学术语的形态特征,(2)用于捕获句子中每个单词的上下文信息的 Bi-LSTM,以及(3)条件随机场用于通过同时考虑周围的单词进行最终标签预测。我们尝试了在单词级分类任务中常用的不同单词嵌入方法,并展示了将特定于领域和通用的预训练单词嵌入集成使用来从 EHR 中检测 ADE 的影响。
我们的系统在 MADE1.0 挑战中的命名实体识别任务中排名第一,微平均 F1 得分为 0.8290(官方得分)。
我们的结果表明,将两种互补的广泛使用的序列标记技术与双级嵌入(字符级和单词级)集成在一起,以在输入层中表示单词,可以构建一种深度学习架构,从而实现对 EHR 记录的出色信息提取准确性。