基于字符和词注意力增强神经网络的中文住院病案中医学命名实体抽取

Medical Named Entity Extraction from Chinese Resident Admit Notes Using Character and Word Attention-Enhanced Neural Network.

机构信息

School of Automation, Central South University, Changsha 410083, China.

PTY LTD., Changsha 410083, China.

出版信息

Int J Environ Res Public Health. 2020 Mar 2;17(5):1614. doi: 10.3390/ijerph17051614.

DOI:10.3390/ijerph17051614

PMID:32131522

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC7084381/

Abstract

The resident admit notes (RANs) in electronic medical records (EMRs) is first-hand information to study the patient's condition. Medical entity extraction of RANs is an important task to get disease information for medical decision-making. For Chinese electronic medical records, each medical entity contains not only word information but also rich character information. Effective combination of words and characters is very important for medical entity extraction. We propose a medical entity recognition model based on a character and word attention-enhanced (CWAE) neural network for Chinese RANs. In our model, word embeddings and character-based embeddings are obtained through character-enhanced word embedding (CWE) model and Convolutional Neural Network (CNN) model. Then attention mechanism combines the character-based embeddings and word embeddings together, which significantly improves the expression ability of words. The new word embeddings obtained by the attention mechanism are taken as the input to bidirectional long short-term memory (BI-LSTM) and conditional random field (CRF) to extract entities. We extracted nine types of key medical entities from Chinese RANs and evaluated our model. The proposed method was compared with two traditional machine learning methods CRF, support vector machine (SVM), and the related deep learning models. The result shows that our model has better performance, and the result of our model reaches 94.44% in the F1-score.

摘要

住院医师入院记录（RANs）是电子病历（EMRs）中用于研究患者病情的第一手资料。从 RANs 中提取医学实体是获取疾病信息以用于医疗决策的重要任务。对于中文电子病历，每个医学实体不仅包含单词信息，还包含丰富的字符信息。有效结合单词和字符对于医学实体提取非常重要。我们提出了一种基于字符和单词注意力增强（CWAE）神经网络的中文 RANs 医学实体识别模型。在我们的模型中，通过字符增强的单词嵌入（CWE）模型和卷积神经网络（CNN）模型获得单词嵌入和基于字符的嵌入。然后，注意力机制将基于字符的嵌入和单词嵌入结合在一起，这显著提高了单词的表达能力。通过注意力机制获得的新单词嵌入被用作双向长短期记忆（BI-LSTM）和条件随机场（CRF）的输入，以提取实体。我们从中文 RANs 中提取了九种关键医学实体，并对我们的模型进行了评估。将提出的方法与两种传统机器学习方法 CRF、支持向量机（SVM）和相关的深度学习模型进行了比较。结果表明，我们的模型具有更好的性能，F1 得分达到了 94.44%。

Suppr 超能文献

文献检索

文件翻译

深度研究

Suppr 超能文献

文献检索

文件翻译

深度研究

基于字符和词注意力增强神经网络的中文住院病案中医学命名实体抽取

Medical Named Entity Extraction from Chinese Resident Admit Notes Using Character and Word Attention-Enhanced Neural Network.

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献

基于字符和词注意力增强神经网络的中文住院病案中医学命名实体抽取

Medical Named Entity Extraction from Chinese Resident Admit Notes Using Character and Word Attention-Enhanced Neural Network.

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献