Guangxi Key Laboratory of Images and Graphics Intelligent Processing, Guilin University of Electronic Technology, Guilin 541004, China.
Nanning Research Institute, Guilin University of Electronic Technology, Guilin 541004, China.
Sensors (Basel). 2023 Feb 4;23(4):1771. doi: 10.3390/s23041771.
The performance of Chinese-named entity recognition (NER) has improved via word enhancement or new frameworks that incorporate various types of external data. However, for Chinese NER, syntactic composition (in sentence level) and inner regularity (in character-level) have rarely been studied. Chinese characters are highly sensitive to sentential syntactic data. The same Chinese character sequence can be decomposed into different combinations of words according to how they are used and placed in the context. In addition, the same type of entities usually have the same naming rules due to the specificity of the Chinese language structure. This paper presents a Kcr-FLAT to improve the performance of Chinese NER with enhanced semantic information. Specifically, we first extract different types of syntactic data, functionalize the syntactic information by a key-value memory network (KVMN), and fuse them by attention mechanism. Then the syntactic information and lexical information are integrated by a cross-transformer. Finally, we use an inner regularity perception module to capture the internal regularity of each entity for better entity type prediction. The experimental results show that with F1 scores as the evaluation index, the proposed model obtains 96.51%, 96.81%, and 70.12% accuracy rates on MSRA, resume, and Weibo datasets, respectively.
通过词增强或包含各种类型外部数据的新框架,中文命名实体识别(NER)的性能得到了提高。然而,对于中文 NER,句子级别的句法构成和字符级别的内在规律很少被研究。汉字对句子级别的句法数据非常敏感。根据在上下文中的使用和位置,相同的汉字序列可以分解成不同的词组合。此外,由于汉语结构的特殊性,同一类型的实体通常具有相同的命名规则。本文提出了一种 Kcr-FLAT 方法,通过增强语义信息来提高中文 NER 的性能。具体来说,我们首先提取不同类型的句法数据,通过键值记忆网络(KVMN)对句法信息进行功能化,并通过注意力机制对其进行融合。然后,句法信息和词汇信息通过交叉变换器进行整合。最后,我们使用内部规律感知模块来捕捉每个实体的内部规律,以更好地进行实体类型预测。实验结果表明,以 F1 分数作为评价指标,该模型在 MSRA、简历和微博数据集上的准确率分别为 96.51%、96.81%和 70.12%。