Key Laboratory of Big Data and Intelligent Robot (South China University of Technology), Ministry of Education,; School of Software Engineering, South China University of Technology, Guangzhou, China.
Key Laboratory of Big Data and Intelligent Robot (South China University of Technology), Ministry of Education,; School of Software Engineering, South China University of Technology, Guangzhou, China.
Neural Netw. 2021 Oct;142:340-350. doi: 10.1016/j.neunet.2021.02.019. Epub 2021 Mar 16.
Named entity recognition (NER) is crucial in various natural language processing (NLP) tasks. However, the nested entities which are common in practical corpus are often ignored in most of current NER models. To extract the nested entities, two categories of models (i.e., feature-based and neural network-based approaches) are proposed. However, the feature-based models suffer from the complicated feature engineering and often heavily rely on the external resources. Discarding the heavy feature engineering, recent neural network-based methods which treat the nested NER as a classification task are designed but still suffer from the heavy class imbalance issue and the high computational cost. To solve these problems, we propose a neural multi-task model with two modules: Binary Sequence Labeling and Candidate Region Classification to extract the nested entities. Extensive experiments are conducted on the public datasets. Comparing with recent neural network-based approaches, our proposed model achieves the better performance and obtains the higher efficiency.
命名实体识别(NER)在各种自然语言处理(NLP)任务中至关重要。然而,在大多数当前的 NER 模型中,通常忽略了实际语料库中常见的嵌套实体。为了提取嵌套实体,提出了两类模型(基于特征和基于神经网络的方法)。然而,基于特征的模型存在复杂的特征工程问题,并且经常严重依赖外部资源。为了解决这些问题,我们提出了一种具有两个模块的神经多任务模型:二进制序列标记和候选区域分类,以提取嵌套实体。在公共数据集上进行了广泛的实验。与最近的基于神经网络的方法相比,我们提出的模型具有更好的性能和更高的效率。