Suppr超能文献

基于深度学习预训练的中文电子病历临床命名实体识别。

Clinical Named Entity Recognition from Chinese Electronic Medical Records Based on Deep Learning Pretraining.

机构信息

Jiangsu Key Lab of Big Data Security & Intelligent Processing, School of Computer Science, Nanjing University of Posts and Telecommunications, Nanijing 210023, China.

Zhejiang Engineering Research Center of Intelligent Medicine, Wenzhou 325035, China.

出版信息

J Healthc Eng. 2020 Nov 24;2020:8829219. doi: 10.1155/2020/8829219. eCollection 2020.

Abstract

BACKGROUND

Clinical named entity recognition is the basic task of mining electronic medical records text, which are with some challenges containing the language features of Chinese electronic medical records text with many compound entities, serious missing sentence components, and unclear entity boundary. Moreover, the corpus of Chinese electronic medical records is difficult to obtain.

METHODS

Aiming at these characteristics of Chinese electronic medical records, this study proposed a Chinese clinical entity recognition model based on deep learning pretraining. The model used word embedding from domain corpus and fine-tuning of entity recognition model pretrained by relevant corpus. Then BiLSTM and Transformer are, respectively, used as feature extractors to identify four types of clinical entities including diseases, symptoms, drugs, and operations from the text of Chinese electronic medical records.

RESULTS

75.06% Macro-, 76.40% Macro- and 75.72% Macro-1 aiming at test dataset could be achieved. These experiments show that the Chinese clinical entity recognition model based on deep learning pretraining can effectively improve the recognition effect.

CONCLUSIONS

These experiments show that the proposed Chinese clinical entity recognition model based on deep learning pretraining can effectively improve the recognition performance.

摘要

背景

临床命名实体识别是挖掘电子病历文本的基础任务,其面临一些挑战,包括含有许多复合实体、严重缺失句子成分和实体边界不清晰等中文电子病历文本的语言特征。此外,中文电子病历语料库难以获取。

方法

针对中文电子病历的这些特点,本研究提出了一种基于深度学习预训练的中文临床实体识别模型。该模型使用来自领域语料库的词嵌入,并对相关语料库预训练的实体识别模型进行微调。然后,分别使用 BiLSTM 和 Transformer 作为特征提取器,从中文电子病历的文本中识别出疾病、症状、药物和手术等四种临床实体。

结果

在测试数据集上,分别达到了 75.06%、76.40%和 75.72%的 Macro-,这些实验表明,基于深度学习预训练的中文临床实体识别模型可以有效地提高识别效果。

结论

这些实验表明,所提出的基于深度学习预训练的中文临床实体识别模型可以有效地提高识别性能。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2330/7707942/17c8ed52f11f/JHE2020-8829219.001.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验