Suppr超能文献

基于 BIGRU 的堆叠注意力网络在中文电子病历中的生物医学命名实体识别。

A BIGRU-Based Stacked Attention Network for Biomedical Named Entity Recognition with Chinese EMRs.

机构信息

Peking Union Medical College Hospital, Chinese Academy of Medical Sciences and Peking Union Medical College (CAMS & PUMC), Beijing, China.

Beijing University of Technology, Beijing, China.

出版信息

Stud Health Technol Inform. 2023 Nov 23;308:757-767. doi: 10.3233/SHTI230909.

Abstract

Biomedical named entity recognition (BNER) is an effective method to structure the medical text data. It is an important basic task for building the medical application services such as the medical knowledge graphs and the intelligent auxiliary diagnosis systems. Existing medical named entity recognition methods generally leverage the word embedding model to construct text representation, and then integrate multiple semantic understanding models to enhance the semantic understanding ability of the model to achieve high-performance entity recognition. However, in the medical field, there are many professional terms that rarely appear in the general field, which cannot be represented well by the general domain word embedding model. Second, existing approaches typically only focus on the extraction of global semantic features, which generate a loss of local semantic features between characters. Moreover, as the word embedding dimension becomes much higher, the standard single-layer structure fails to fully and deeply extract the global semantic features. We put forward the BIGRU-based Stacked Attention Network (BSAN) model for biomedical named entity recognition. Firstly, we use the large-scale real-world medical electronic medical record (EMR) data to fine-tune BERT to build the proprietary embedding representations of the medical terms. Second, we use the Convolutional Neural Network model to extract semantic features. Finally, a stacked BIGRU is constructed using a multi-layer structure and a novel stacking method. It not only enables comprehensive and in-depth extraction of global semantic features, but also requires less time. Experimentally validated on the real-world datasets in Chinese EMRs, the proposed BSAN model achieves 90.9% performance on F1-values, which is stronger than the BNER performance of other state-of-the-art models.

摘要

生物医学命名实体识别(BNER)是一种有效的方法,可以对医学文本数据进行结构化处理。它是构建医疗应用服务(如医疗知识图谱和智能辅助诊断系统)的重要基础任务。现有的医学命名实体识别方法通常利用词嵌入模型构建文本表示,然后集成多个语义理解模型,以增强模型的语义理解能力,从而实现高性能的实体识别。然而,在医学领域中,有许多专业术语很少出现在一般领域中,通用领域的词嵌入模型无法很好地表示它们。其次,现有的方法通常只关注全局语义特征的提取,这会导致字符之间的局部语义特征丢失。此外,随着词嵌入维度变得更高,标准的单层结构无法充分和深入地提取全局语义特征。我们提出了基于 BIGRU 的堆叠注意力网络(BSAN)模型,用于生物医学命名实体识别。首先,我们使用大规模的真实世界医学电子病历(EMR)数据微调 BERT 以构建医学术语的专有嵌入表示。其次,我们使用卷积神经网络模型提取语义特征。最后,使用多层结构和新颖的堆叠方法构建堆叠的 BIGRU。它不仅能够全面深入地提取全局语义特征,而且所需时间更少。在真实世界的中文 EMR 数据集上进行实验验证,所提出的 BSAN 模型在 F1 值上达到 90.9%的性能,比其他最先进模型的 BNER 性能更强。

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验