Huang Fang, Wang Baocheng, Safarzadeh Jafar
School of Mathematics, Jiaozuo Normal College, Jiaozuo, 454000, Henan, China.
School of Information Technology, Xiangyang Polytechnic, Xiangyang, 441050, Hubei, China.
Sci Rep. 2025 May 29;15(1):18842. doi: 10.1038/s41598-025-02275-6.
Named entity recognition (NER) has been seen as a fundamental component for various natural language processing (NLP) tasks, such as extracting information and answering questions. NER is used to comprehend the significance of information within a given context, and it also aids in retrieving and organizing data. The current study has introduced a new neural network, known as echo state network (ESN), to NER using CoNLL-2003. The characteristics were transformed into embeddings utilizing an embedding layer. These embeddings were then incorporated into the ESN. Additionally, the neural network was optimized utilizing the quantum-based sand cat swarm optimization algorithm. Eventually, CRF was employed to produce the predicted sequence of labels. Various assessment metrics, such as recall, precision, F1-score, MCC, and Cohen's Kappa, were used to evaluate the effectiveness of BiLSTM-MultiBERT6L, BiLSTM-CNNs-CRF, Bi-directional LSTM-CNNs, BiLSTM-ELMo, BERT, and the proposed ESN/quantum-based sand cat swarm optimization algorithm. Overall, it was demonstrated that the proposed model could achieve better results than the other proposed models. The main contribution of this study is the combination of QSCSO with ESN, which improves the model's capacity to comprehend long-term dependencies and effectively optimize hyperparameters. This research pushes forward the domain of NER and offers a scalable and efficacious architecture for related sequence labeling tasks. Recognizing entities such as person names, organizations, dates, and locations is essential as it allows machines to derive valuable insights from unstructured text. This aids in activities like information retrieval (for instance, locating pertinent documents), building knowledge graphs (such as connecting entities to establish relationships), and streamlining workflows (like summarizing news or extracting data for databases).
命名实体识别(NER)被视为各种自然语言处理(NLP)任务的基本组成部分,例如提取信息和回答问题。NER用于理解给定上下文中信息的重要性,它还有助于检索和组织数据。当前的研究使用CoNLL-2003将一种称为回声状态网络(ESN)的新神经网络引入到NER中。利用嵌入层将特征转换为嵌入。然后将这些嵌入合并到ESN中。此外,利用基于量子的沙猫群优化算法对神经网络进行了优化。最终,采用条件随机场(CRF)生成预测的标签序列。使用各种评估指标,如召回率、精确率、F1分数、马修斯相关系数(MCC)和科恩卡帕系数,来评估双向长短期记忆网络-多语言双向编码器表征(BiLSTM-MultiBERT6L)、双向长短期记忆网络-卷积神经网络-条件随机场(BiLSTM-CNNs-CRF)、双向长短期记忆网络-卷积神经网络(Bi-directional LSTM-CNNs)、双向长短期记忆网络-语言模型增强(BiLSTM-ELMo)、双向编码器表征(BERT)以及所提出的ESN/基于量子的沙猫群优化算法的有效性。总体而言,结果表明所提出的模型比其他模型能取得更好的效果。本研究的主要贡献是将量子沙猫群优化算法与回声状态网络相结合,这提高了模型理解长期依赖关系并有效优化超参数的能力。这项研究推动了命名实体识别领域的发展,并为相关序列标注任务提供了一种可扩展且有效的架构。识别诸如人名、组织、日期和地点等实体至关重要,因为这能让机器从非结构化文本中获取有价值的见解。这有助于诸如信息检索(例如,查找相关文档)、构建知识图谱(如连接实体以建立关系)以及简化工作流程(如总结新闻或为数据库提取数据)等活动。