School of Medical Information and Engineering, Xuzhou Medical University, Jiangsu 221004, China.
Math Biosci Eng. 2022 Jan 4;19(3):2206-2218. doi: 10.3934/mbe.2022103.
Named entities are the main carriers of relevant medical knowledge in Electronic Medical Records (EMR). Clinical electronic medical records lead to problems such as word segmentation ambiguity and polysemy due to the specificity of Chinese language structure, so a Clinical Named Entity Recognition (CNER) model based on multi-head self-attention combined with BILSTM neural network and Conditional Random Fields is proposed. Firstly, the pre-trained language model organically combines char vectors and word vectors for the text sequences of the original dataset. The sequences are then fed into the parallel structure of the multi-head self-attention module and the BILSTM neural network module, respectively. By splicing the output of the neural network module to obtain multi-level information such as contextual information and feature association weights. Finally, entity annotation is performed by CRF. The results of the multiple comparison experiments show that the structure of the proposed model is very reasonable and robust, and it can effectively improve the Chinese CNER model. The model can extract multi-level and more comprehensive text features, compensate for the defect of long-distance dependency loss, with better applicability and recognition performance.
命名实体是电子病历(EMR)中相关医学知识的主要载体。由于汉语结构的特殊性,临床电子病历导致分词歧义、一词多义等问题,因此提出了一种基于多头自注意力与 BiLSTM 神经网络和条件随机场相结合的临床命名实体识别(CNER)模型。首先,预训练语言模型将字符向量和单词向量有机地结合起来,对原始数据集的文本序列进行处理。然后,将序列分别输入多头自注意力模块和 BiLSTM 神经网络模块的并行结构中,通过拼接神经网络模块的输出,获取上下文信息和特征关联权重等多层次信息。最后,通过 CRF 进行实体标注。多项对比实验的结果表明,所提出模型的结构非常合理和稳健,能够有效提高中文 CNER 模型的性能。该模型可以提取多层次、更全面的文本特征,弥补长距离依赖损失的缺陷,具有更好的适用性和识别性能。