Suppr超能文献

一种用于精确实体识别的扩散增强条件随机场和双向长短期记忆网络框架。

A diffusion enhanced CRF and BiLSTM framework for accurate entity recognition.

作者信息

Qiu Yunfei, Dong Libo, Zhang Wenwen, Xing Haoran, Huang Junwei

机构信息

Software Engineering, Liaoning Technical University, Huludao, 123105, Liaoning, China.

出版信息

Sci Rep. 2025 Jun 4;15(1):19670. doi: 10.1038/s41598-025-04036-x.

Abstract

In Named Entity Recognition tasks, the diffusion model effectively processes discrete data. However, the original model struggles with capturing long-distance dependencies and integrating contextual information, making it difficult to recognize related entities and handle complex syntactic structures. These issues result in ambiguity and uncertainty in entity boundary recognition, affecting overall accuracy and stability. To solve this, we suggest a diffusion model with Conditional Random Fields and Bidirectional Long Short-Term Memory layers. Firstly, the BiLSTM-CRF model captures long-distance dependencies and contextual information, enhancing entity boundary recognition accuracy. Secondly, the Tversky and CRF loss functions select optimal label predictions from the probability distribution, integrating these through weighted summation to enhance sequence dependency processing and label accuracy. Thirdly, we introduce self-attention and graph attention mechanisms to handle complex data structures by processing attention probabilities, integrating with the adjacency matrix, and improving the recognition of key entity relationships. Finally, an automatic noise adjustment mechanism modifies noise levels based on performance, enhancing stability and robustness in inconsistent environments. Experiments demonstrate that this approach improves performance on several NER datasets, with significant gains in recall, accuracy, and F1 scores, making the model more robust in handling noisy and complex environments.

摘要

在命名实体识别任务中,扩散模型有效地处理离散数据。然而,原始模型在捕捉长距离依赖关系和整合上下文信息方面存在困难,导致难以识别相关实体并处理复杂的句法结构。这些问题导致实体边界识别中的模糊性和不确定性,影响整体准确性和稳定性。为了解决这个问题,我们提出了一种带有条件随机场和双向长短期记忆层的扩散模型。首先,双向长短期记忆 - 条件随机场(BiLSTM - CRF)模型捕捉长距离依赖关系和上下文信息,提高实体边界识别的准确性。其次,特沃斯基(Tversky)和条件随机场损失函数从概率分布中选择最优标签预测,通过加权求和将这些预测整合起来,以增强序列依赖关系处理和标签准确性。第三,我们引入自注意力和图注意力机制,通过处理注意力概率、与邻接矩阵整合以及改善关键实体关系的识别来处理复杂的数据结构。最后,一种自动噪声调整机制根据性能修改噪声水平,在不一致的环境中增强稳定性和鲁棒性。实验表明,这种方法在几个命名实体识别数据集上提高了性能,在召回率、准确率和F1分数方面有显著提升,使模型在处理噪声和复杂环境时更具鲁棒性。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/77ec/12137753/b1bf9b9768c3/41598_2025_4036_Fig1_HTML.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验