Suppr超能文献

基于自注意力双向长短时记忆条件随机场中上下文嵌入的准确灾害实体识别。

Accurate disaster entity recognition based on contextual embeddings in self-attentive BiLSTM-CRF.

作者信息

Hafsa Noor E, Alzoubi Hadeel Mohammed, Almutlq Atikah Saeed

机构信息

Department of Computer Science, College of Computer Science and Information Technology, King Faisal University, Al Ahsa, Saudi Arabia.

出版信息

PLoS One. 2025 Mar 26;20(3):e0318262. doi: 10.1371/journal.pone.0318262. eCollection 2025.

Abstract

Automated extraction of disaster-related named entities is crucial for gathering pertinent information during natural or human-made crises. Timely and reliable data is vital for effective disaster management, benefiting humanitarian response authorities, law enforcement agencies, and other concerned organizations. Online news media plays a pivotal role in disseminating crisis-related information during emergencies and facilitating post-hazard disaster response operations. To extract relevant named entities, contextual embedding features prove instrumental. In this study, we investigate the automatic extraction of disaster-related named entities from an annotated dataset of 1000 online news articles. These articles are carefully annotated with 14 crisis-specific entities obtained from relevant ontologies. To generate contextual vector representations of words, we construct a novel word embedding model inspired by Word2vec. These contextual word embedding features, combined with lexicon features, are encoded using a novel contextualized deep Bi-directional LSTM network augmented with self-attention and conditional random field (CRF) layers. We compare the performance of our proposed model with existing word embedding approaches. Through extensive evaluation on an independent test set of 200 articles that includes more than 80,000 tokens, our context-sensitive optimized NER model achieves impressive results at the sentence level. With a Precision of 92%, Recall of 91%, Accuracy of 87%, and an F1-score of 92%, our model outperforms those utilizing general and non-contextual word embeddings, including fine-tuned and contextual BERT models, showcasing its superior performance.

摘要

自动提取与灾害相关的命名实体对于在自然或人为危机期间收集相关信息至关重要。及时且可靠的数据对于有效的灾害管理至关重要,这有益于人道主义救援机构、执法机构及其他相关组织。在线新闻媒体在紧急情况下传播与危机相关的信息以及促进灾后灾害应对行动中发挥着关键作用。为了提取相关命名实体,上下文嵌入特征证明很有帮助。在本研究中,我们从1000篇在线新闻文章的注释数据集中研究与灾害相关的命名实体的自动提取。这些文章用从相关本体中获得的14个特定于危机的实体进行了仔细注释。为了生成单词的上下文向量表示,我们构建了一个受Word2vec启发的新颖词嵌入模型。这些上下文词嵌入特征与词汇特征相结合,使用一个新颖的上下文深度双向LSTM网络进行编码,该网络增加了自注意力和条件随机场(CRF)层。我们将我们提出的模型的性能与现有的词嵌入方法进行比较。通过对包含超过80,000个词元的200篇文章的独立测试集进行广泛评估,我们的上下文敏感优化命名实体识别模型在句子级别取得了令人印象深刻的结果。我们的模型的精确率为92%,召回率为91%,准确率为87%,F1分数为92%,优于那些使用通用和非上下文词嵌入的模型,包括微调的和上下文的BERT模型,展示了其卓越的性能。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6fd7/11940654/d02954c21b07/pone.0318262.g002.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验