Yu Helong, Wang Chenxi, Xue Mingxuan
College of Information Technology, Jilin Agricultural University, Changchun, China.
Front Plant Sci. 2024 Jun 25;15:1368847. doi: 10.3389/fpls.2024.1368847. eCollection 2024.
The diversity of edible fungus species and the extent of mycological knowledge pose significant challenges to the research, cultivation, and popularization of edible fungus. To tackle this challenge, there is an urgent need for a rapid and accurate method of acquiring relevant information. The emergence of question and answer (Q&A) systems has the potential to solve this problem. Named entity recognition (NER) provides the basis for building an intelligent Q&A system for edible fungus. In the field of edible fungus, there is a lack of a publicly available Chinese corpus suitable for use in NER, and conventional methods struggle to capture long-distance dependencies in the NER process.
This paper describes the establishment of a Chinese corpus in the field of edible fungus and introduces an NER method for edible fungus information based on XLNet and conditional random fields (CRFs). Our approach combines an iterated dilated convolutional neural network (IDCNN) with a CRF. First, leveraging the XLNet model as the foundation, an IDCNN layer is introduced. This layer addresses the limited capacity to capture features across utterances by extending the receptive field of the convolutional kernel. The output of the IDCNN layer is input to the CRF layer, which mitigates any labeling logic errors, resulting in the globally optimal labels for the NER task relating to edible fungus.
Experimental results show that the precision achieved by the proposed model reaches 0.971, with a recall of 0.986 and an F1-score of 0.979.
The proposed model outperforms existing approaches in terms of these evaluation metrics, effectively recognizing entities related to edible fungus information and offering methodological support for the construction of knowledge graphs.
食用菌种类的多样性以及真菌学知识的广度给食用菌的研究、栽培和推广带来了重大挑战。为应对这一挑战,迫切需要一种快速准确获取相关信息的方法。问答(Q&A)系统的出现有潜力解决这个问题。命名实体识别(NER)为构建食用菌智能问答系统提供了基础。在食用菌领域,缺乏适用于NER的公开可用中文语料库,并且传统方法在NER过程中难以捕捉长距离依赖关系。
本文描述了食用菌领域中文语料库的建立,并介绍了一种基于XLNet和条件随机场(CRF)的食用菌信息NER方法。我们的方法将迭代扩张卷积神经网络(IDCNN)与CRF相结合。首先,以XLNet模型为基础,引入IDCNN层。该层通过扩展卷积核的感受野来解决跨语句捕捉特征能力有限的问题。IDCNN层的输出输入到CRF层,该层减轻了任何标注逻辑错误,从而得到与食用菌相关的NER任务的全局最优标注。
实验结果表明,所提出模型的精确率达到0.971,召回率为0.986,F1分数为0.979。
在所提出的模型在这些评估指标方面优于现有方法,有效地识别了与食用菌信息相关的实体,并为知识图谱的构建提供了方法支持。