Li Fei, Liu Weisong, Yu Hong
Department of Computer Science, University of Massachusetts Lowell, Lowell, MA, United States.
Center for Healthcare Organization and Implementation Research, Bedford Veterans Affairs Medical Center, Bedford, MA, United States.
JMIR Med Inform. 2018 Nov 26;6(4):e12159. doi: 10.2196/12159.
Pharmacovigilance and drug-safety surveillance are crucial for monitoring adverse drug events (ADEs), but the main ADE-reporting systems such as Food and Drug Administration Adverse Event Reporting System face challenges such as underreporting. Therefore, as complementary surveillance, data on ADEs are extracted from electronic health record (EHR) notes via natural language processing (NLP). As NLP develops, many up-to-date machine-learning techniques are introduced in this field, such as deep learning and multi-task learning (MTL). However, only a few studies have focused on employing such techniques to extract ADEs.
We aimed to design a deep learning model for extracting ADEs and related information such as medications and indications. Since extraction of ADE-related information includes two steps-named entity recognition and relation extraction-our second objective was to improve the deep learning model using multi-task learning between the two steps.
We employed the dataset from the Medication, Indication and Adverse Drug Events (MADE) 1.0 challenge to train and test our models. This dataset consists of 1089 EHR notes of cancer patients and includes 9 entity types such as Medication, Indication, and ADE and 7 types of relations between these entities. To extract information from the dataset, we proposed a deep-learning model that uses a bidirectional long short-term memory (BiLSTM) conditional random field network to recognize entities and a BiLSTM-Attention network to extract relations. To further improve the deep-learning model, we employed three typical MTL methods, namely, hard parameter sharing, parameter regularization, and task relation learning, to build three MTL models, called HardMTL, RegMTL, and LearnMTL, respectively.
Since extraction of ADE-related information is a two-step task, the result of the second step (ie, relation extraction) was used to compare all models. We used microaveraged precision, recall, and F1 as evaluation metrics. Our deep learning model achieved state-of-the-art results (F1=65.9%), which is significantly higher than that (F1=61.7%) of the best system in the MADE1.0 challenge. HardMTL further improved the F1 by 0.8%, boosting the F1 to 66.7%, whereas RegMTL and LearnMTL failed to boost the performance.
Deep learning models can significantly improve the performance of ADE-related information extraction. MTL may be effective for named entity recognition and relation extraction, but it depends on the methods, data, and other factors. Our results can facilitate research on ADE detection, NLP, and machine learning.
药物警戒和药品安全监测对于监测药物不良事件(ADEs)至关重要,但诸如美国食品药品监督管理局不良事件报告系统等主要的ADE报告系统面临着漏报等挑战。因此,作为补充监测手段,通过自然语言处理(NLP)从电子健康记录(EHR)笔记中提取ADEs数据。随着NLP的发展,该领域引入了许多最新的机器学习技术,如深度学习和多任务学习(MTL)。然而,只有少数研究专注于运用此类技术来提取ADEs。
我们旨在设计一种深度学习模型,用于提取ADEs以及相关信息,如药物和适应症。由于提取ADE相关信息包括两个步骤——命名实体识别和关系提取,我们的第二个目标是通过在这两个步骤之间进行多任务学习来改进深度学习模型。
我们使用来自药物、适应症和药物不良事件(MADE)1.0挑战赛的数据集来训练和测试我们的模型。该数据集由1089份癌症患者的EHR笔记组成,包括9种实体类型,如药物、适应症和ADE,以及这些实体之间的7种关系类型。为了从数据集中提取信息,我们提出了一种深度学习模型,该模型使用双向长短期记忆(BiLSTM)条件随机场网络来识别实体,并使用BiLSTM注意力网络来提取关系。为了进一步改进深度学习模型,我们采用了三种典型的MTL方法,即硬参数共享、参数正则化和任务关系学习,分别构建了三个MTL模型,称为HardMTL、RegMTL和LearnMTL。
由于提取ADE相关信息是一项两步任务,第二步(即关系提取)的结果用于比较所有模型。我们使用微平均精度、召回率和F1作为评估指标。我们的深度学习模型取得了领先的结果(F1 = 65.9%),显著高于MADE1.0挑战赛中最佳系统的结果(F1 = 61.7%)。HardMTL进一步将F1提高了0.8%,使F1提升至66.7%,而RegMTL和LearnMTL未能提高性能。
深度学习模型可以显著提高ADE相关信息提取的性能。MTL对于命名实体识别和关系提取可能是有效的,但这取决于方法、数据和其他因素。我们的结果有助于ADE检测、NLP和机器学习的研究。