Suppr超能文献

基于混合神经网络和医学 MC-BERT 的中文电子病历命名实体识别。

Named entity recognition of Chinese electronic medical records based on a hybrid neural network and medical MC-BERT.

机构信息

College of Computer and Information Technology, China Three Gorges University, Yichang, 443002, China.

Hubei Province Engineering Technology Research Center for Construction Quality Testing Equipments, China Three Gorges University, Yichang, 443002, China.

出版信息

BMC Med Inform Decis Mak. 2022 Dec 1;22(1):315. doi: 10.1186/s12911-022-02059-2.

Abstract

BACKGROUND

Named entity recognition (NER) of electronic medical records is an important task in clinical medical research. Although deep learning combined with pretraining models performs well in recognizing entities in clinical texts, because Chinese electronic medical records have a special text structure and vocabulary distribution, general pretraining models cannot effectively incorporate entities and medical domain knowledge into representation learning; separate deep network models lack the ability to fully extract rich features in complex texts, which negatively affects the named entity recognition of electronic medical records.

METHODS

To better represent electronic medical record text, we extract the text's local features and multilevel sequence interaction information to improve the effectiveness of electronic medical record named entity recognition. This paper proposes a hybrid neural network model based on medical MC-BERT, namely, the MC-BERT + BiLSTM + CNN + MHA + CRF model. First, MC-BERT is used as the word embedding model of the text to obtain the word vector, and then BiLSTM and CNN obtain the feature information of the forward and backward directions of the word vector and the local context to obtain the corresponding feature vector. After merging the two feature vectors, they are sent to multihead self-attention (MHA) to obtain multilevel semantic features, and finally, CRF is used to decode the features and predict the label sequence.

RESULTS

The experiments show that the F1 values of our proposed hybrid neural network model based on MC-BERT reach 94.22%, 86.47%, and 92.28% on the CCKS-2017, CCKS-2019 and cEHRNER datasets, respectively. Compared with the general-domain BERT-based BiLSTM + CRF, our F1 values increased by 0.89%, 1.65% and 2.63%. Finally, we analyzed the effect of an unbalanced number of entities in the electronic medical records on the results of the NER experiment.

摘要

背景

电子病历中的命名实体识别(NER)是临床医学研究中的一项重要任务。虽然深度学习结合预训练模型在识别临床文本中的实体方面表现出色,但由于中文电子病历具有特殊的文本结构和词汇分布,一般的预训练模型无法有效地将实体和医学领域知识纳入表示学习中;独立的深度网络模型缺乏充分提取复杂文本中丰富特征的能力,这对电子病历的命名实体识别产生负面影响。

方法

为了更好地表示电子病历文本,我们提取文本的局部特征和多层次序列交互信息,以提高电子病历命名实体识别的有效性。本文提出了一种基于医学 MC-BERT 的混合神经网络模型,即 MC-BERT+BILSTM+CNN+MHA+CRF 模型。首先,MC-BERT 被用作文本的词嵌入模型,以获取词向量,然后 BILSTM 和 CNN 获取词向量的前向和后向方向以及局部上下文的特征信息,以获取相应的特征向量。在合并两个特征向量后,将它们发送到多头自注意力(MHA)以获取多层次语义特征,最后使用 CRF 对特征进行解码并预测标签序列。

结果

实验表明,基于 MC-BERT 的混合神经网络模型在 CCKS-2017、CCKS-2019 和 cEHRNER 数据集上的 F1 值分别达到 94.22%、86.47%和 92.28%。与基于一般领域 BERT 的 BiLSTM+CRF 相比,我们的 F1 值分别提高了 0.89%、1.65%和 2.63%。最后,我们分析了电子病历中实体数量不平衡对 NER 实验结果的影响。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/657e/9714133/7a543c100bb9/12911_2022_2059_Fig1_HTML.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验