Lou Yongjun, Gao Meng, Zhang Shuo, Yang Hongjun, Wang Sicong, He Yongqiang, Yang Jing, Yang Wenxia, Du Haitao, Shen Weizheng
College of Electrical Engineering and Information, Northeast Agricultural University, Harbin 150030, China.
Animal Husbandry and Veterinary Institute of Shandong Academy of Agricultural Sciences, Ji'nan 250010, China.
Animals (Basel). 2025 Mar 13;15(6):822. doi: 10.3390/ani15060822.
Named entity recognition (NER) is the basic task of constructing a high-quality knowledge graph, which can provide reliable knowledge in the auxiliary diagnosis of dairy cow disease, thus alleviating problems of missed diagnosis and misdiagnosis due to the lack of professional veterinarians in China. Targeting the characteristics of the Chinese dairy cow diseases corpus, we propose an ensemble Chinese NER model incorporating character-level, pinyin-level, glyph-level, and lexical-level features of Chinese characters. These multi-level features were concatenated and fed into the bidirectional long short-term memory (Bi-LSTM) network based on the multi-head self-attention mechanism to learn long-distance dependencies while focusing on important features. Finally, the globally optimal label sequence was obtained by the conditional random field (CRF) model. Experimental results showed that our proposed model outperformed baselines and related works with an F1 score of 92.18%, which is suitable and effective for named entity recognition for the dairy cow disease corpus.
命名实体识别(NER)是构建高质量知识图谱的基础任务,它能够在奶牛疾病辅助诊断中提供可靠的知识,从而缓解因中国缺乏专业兽医而导致的漏诊和误诊问题。针对中国奶牛疾病语料库的特点,我们提出了一种集成汉字字符级、拼音级、字形级和词汇级特征的中文NER模型。这些多层次特征被拼接起来,并基于多头自注意力机制输入到双向长短期记忆(Bi-LSTM)网络中,以学习长距离依赖关系,同时关注重要特征。最后,通过条件随机场(CRF)模型获得全局最优标签序列。实验结果表明,我们提出的模型以92.18%的F1分数优于基线模型和相关工作,适用于奶牛疾病语料库的命名实体识别,且效果良好。