Huang Runqing, Li Meijing, Zheng Huilin, Zhao Ziqi
College of Information Engineering, Shanghai Maritime University, Shanghai, 201306, China.
College of Computer Science and Engineering, Guilin University of Technology, Guilin, 541004, China.
Sci Rep. 2025 Aug 27;15(1):31573. doi: 10.1038/s41598-025-04252-5.
Chinese crop diseases and pests named entity recognition (CCDP-NER) is a critical step in extracting domain-specific information in the field of crop diseases and pests, playing a significant role in promoting agricultural informatization. To address challenges such as noisy data, erroneous annotations, and ambiguous entity boundaries in the crop disease and pest domain, this study proposes a deep learning-based CCDP-NER model. The model employs a bidirectional gated recurrent Unit (BiGRU) to capture long-range semantic dependencies and integrates multi-level dilated convolutional neural networks (DCNNs) to extract local fine-grained features, thereby constructing a global-local collaborative representation. Innovatively, the variational information bottleneck (VIB) technique is introduced to filter noise by constraining mutual information, reducing the impact of input noise on feature extraction while simultaneously enhancing the correlation between extracted features and labels, thereby improving model robustness. Additionally, an entity boundary detection module is incorporated to identify the head and tail positions of entities, enhancing boundary recognition accuracy. Experiments conducted on a constructed crop diseases and pests dataset demonstrate that the proposed model effectively identifies crop disease and pest entities, achieving an F1-score of 90.64%. This research holds significant value for applications such as agricultural knowledge graph construction and agricultural question-answering systems.
中文农作物病虫害命名实体识别(CCDP-NER)是农作物病虫害领域提取特定领域信息的关键步骤,对推动农业信息化具有重要作用。为应对农作物病虫害领域中噪声数据、错误标注和实体边界模糊等挑战,本研究提出了一种基于深度学习的CCDP-NER模型。该模型采用双向门控循环单元(BiGRU)来捕捉长距离语义依赖,并集成多级扩张卷积神经网络(DCNN)以提取局部细粒度特征,从而构建全局-局部协作表示。创新性地,引入变分信息瓶颈(VIB)技术通过约束互信息来过滤噪声,减少输入噪声对特征提取的影响,同时增强提取特征与标签之间的相关性,从而提高模型鲁棒性。此外,还引入了实体边界检测模块来识别实体的头部和尾部位置,提高边界识别精度。在构建的农作物病虫害数据集上进行的实验表明,所提出的模型能够有效识别农作物病虫害实体,F1分数达到90.64%。本研究对于农业知识图谱构建和农业问答系统等应用具有重要价值。