School of Information Science and Engineering, East China University of Science and Technology, Shanghai 200237, China.
School of Information Science and Engineering, East China University of Science and Technology, Shanghai 200237, China.
J Biomed Inform. 2019 Apr;92:103133. doi: 10.1016/j.jbi.2019.103133. Epub 2019 Feb 25.
Clinical named entity recognition aims to identify and classify clinical terms such as diseases, symptoms, treatments, exams, and body parts in electronic health records, which is a fundamental and crucial task for clinical and translational research. In recent years, deep neural networks have achieved significant success in named entity recognition and many other natural language processing tasks. Most of these algorithms are trained end to end, and can automatically learn features from large scale labeled datasets. However, these data-driven methods typically lack the capability of processing rare or unseen entities. Previous statistical methods and feature engineering practice have demonstrated that human knowledge can provide valuable information for handling rare and unseen cases. In this paper, we propose a new model which combines data-driven deep learning approaches and knowledge-driven dictionary approaches. Specifically, we incorporate dictionaries into deep neural networks. In addition, two different architectures that extend the bi-directional long short-term memory neural network and five different feature representation schemes are also proposed to handle the task. Computational results on the CCKS-2017 Task 2 benchmark dataset show that the proposed method achieves the highly competitive performance compared with the state-of-the-art deep learning methods.
临床命名实体识别旨在识别和分类电子健康记录中的临床术语,如疾病、症状、治疗、检查和身体部位,这是临床和转化研究的基本和关键任务。近年来,深度学习在命名实体识别和许多其他自然语言处理任务中取得了显著的成功。这些算法大多是端到端训练的,可以自动从大规模标记数据集学习特征。然而,这些数据驱动的方法通常缺乏处理稀有或未见实体的能力。以前的统计方法和特征工程实践已经证明,人类知识可以为处理稀有和未见案例提供有价值的信息。在本文中,我们提出了一种新的模型,该模型结合了数据驱动的深度学习方法和知识驱动的字典方法。具体来说,我们将字典纳入到深度神经网络中。此外,还提出了两种不同的架构来扩展双向长短期记忆神经网络,以及五种不同的特征表示方案来处理该任务。在 CCKS-2017 任务 2 基准数据集上的计算结果表明,与最先进的深度学习方法相比,所提出的方法具有很高的竞争力。