Gridach Mourad
High Institute of Technology, Ibn Zohr University, Agadir, Morocco.
J Biomed Inform. 2017 Jun;70:85-91. doi: 10.1016/j.jbi.2017.05.002. Epub 2017 May 11.
Biomedical named entity recognition (BNER), which extracts important named entities such as genes and proteins, is a challenging task in automated systems that mine knowledge in biomedical texts. The previous state-of-the-art systems required large amounts of task-specific knowledge in the form of feature engineering, lexicons and data pre-processing to achieve high performance. In this paper, we introduce a novel neural network architecture that benefits from both word- and character-level representations automatically, by using a combination of bidirectional long short-term memory (LSTM) and conditional random field (CRF) eliminating the need for most feature engineering tasks. We evaluate our system on two datasets: JNLPBA corpus and the BioCreAtIvE II Gene Mention (GM) corpus. We obtained state-of-the-art performance by outperforming the previous systems. To the best of our knowledge, we are the first to investigate the combination of deep neural networks, CRF, word embeddings and character-level representation in recognizing biomedical named entities.
生物医学命名实体识别(BNER)是一项在生物医学文本中挖掘知识的自动化系统中的具有挑战性的任务,它用于提取诸如基因和蛋白质等重要命名实体。以前的最先进系统需要大量以特征工程、词典和数据预处理形式存在的特定任务知识才能实现高性能。在本文中,我们介绍了一种新颖的神经网络架构,该架构通过结合双向长短期记忆(LSTM)和条件随机场(CRF)自动受益于单词级和字符级表示,从而无需进行大多数特征工程任务。我们在两个数据集上评估我们的系统:JNLPBA语料库和BioCreAtIvE II基因提及(GM)语料库。我们通过超越以前的系统获得了最先进的性能。据我们所知,我们是第一个在识别生物医学命名实体中研究深度神经网络、CRF、词嵌入和字符级表示相结合的方法的。