Han Qingbin, Ma Jialin
School of Computer and Software Engineering, Huaiyin Institute of Technology, Huaian, 223003, Jiangsu, China.
Sci Rep. 2024 Mar 6;14(1):5564. doi: 10.1038/s41598-024-56166-3.
Chinese is characterized by high syntactic complexity, chaotic annotation granularity, and slow convergence. Joint learning models can effectively improve the accuracy of Chinese Named Entity Recognition (NER), but they focus too much on local feature information and reduce the ability of long sequence feature extraction. To address the limitations of long sequence feature extraction ability, we propose a Chinese NER model called Incorporating Recurrent Cell and Information State Recursion (IRCSR-NER). The model integrates recurrent cells and information state recursion to improve the recognition ability of long entity boundaries. To solve the problem that Chinese and English have different focuses in syntactic analysis. We use the syntactic dependency approach to add lexical relationship information to sentences represented at the word level. The IRCSR-NER is applied to sequence feature extraction to improve the model efficiency and long-text feature extraction ability. The model captures contextual long-distance dependent information while focusing on local feature information. We evaluated our proposed model using four public datasets and compared it with other mainstream models. Experimental results demonstrate that our model outperforms traditional and mainstream models.
中文具有句法复杂度高、标注粒度混乱和收敛速度慢的特点。联合学习模型可以有效提高中文命名实体识别(NER)的准确率,但它们过于关注局部特征信息,降低了长序列特征提取的能力。为了解决长序列特征提取能力的局限性,我们提出了一种名为“整合循环单元与信息状态递归”(IRCSR-NER)的中文NER模型。该模型整合了循环单元和信息状态递归,以提高长实体边界的识别能力。为了解决中文和英文在句法分析上重点不同的问题,我们使用句法依存方法为词级表示的句子添加词汇关系信息。IRCSR-NER应用于序列特征提取,以提高模型效率和长文本特征提取能力。该模型在关注局部特征信息的同时,捕捉上下文长距离依赖信息。我们使用四个公共数据集对我们提出的模型进行了评估,并将其与其他主流模型进行了比较。实验结果表明,我们的模型优于传统模型和主流模型。