基于神经语言模型的双向长短时记忆条件随机场实现临床文本去识别化

De-identification of Clinical Text via Bi-LSTM-CRF with Neural Language Models.

作者信息

Tang Buzhou, Jiang Dehuan, Chen Qingcai, Wang Xiaolong, Yan Jun, Shen Ying

机构信息

Key Laboratory of Network Oriented Intelligent Computation, Harbin Institute of Tech-nology, Shenzhen, China.

Corresponding author:

出版信息

AMIA Annu Symp Proc. 2020 Mar 4;2019:857-863. eCollection 2019.

PMID:32308882

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC7153082/

Abstract

De-identification of clinical text, the prerequisite of electronic clinical data reuse, is a typical named entity recogni tion (NER) problem. A number of state-of-the-art deep learning methods for NER, such as Bi-LSTM-CRF (bidirec tional long-short-term-memory conditional random fields), have been applied for de-identification. Neural language models used for language representation bring great improvement in lots of NLP tasks when they are integrated with other deep learning methods. In this paper, we introduce Bi-LSTM-CRF with neural language models for de- identification of clinical text, and evaluate it on the de-identification datasets of the i2b2 2014 and the CEGS N- GRID 2016 challenges. Four neural language models of three types individually integrated with Bi-LSTM-CRF are compared in this study. Bi-LSTM-CRF with neural language models achieves the highest "strict" micro-averaged F1-score of 95.50% on the i2b2 2014 dataset and 91.82% on the CEGS N-GRID 2016 dataset, becoming new benchmark results on these two datasets respectively De-identification, Named entity recognition, Bidirectional long-short-term-memory, Conditional ran dom fields, Neural language models.

摘要

临床文本去识别化是电子临床数据复用的前提，是一个典型的命名实体识别（NER）问题。许多用于NER的先进深度学习方法，如双向长短期记忆条件随机场（Bi-LSTM-CRF），已被应用于去识别化。当神经语言模型与其他深度学习方法集成时，用于语言表示的神经语言模型在许多自然语言处理任务中带来了很大的改进。在本文中，我们引入了结合神经语言模型的Bi-LSTM-CRF用于临床文本的去识别化，并在i2b2 2014和CEGS N-GRID 2016挑战的去识别化数据集上对其进行评估。本研究比较了三种类型的四个神经语言模型分别与Bi-LSTM-CRF的集成情况。结合神经语言模型的Bi-LSTM-CRF在i2b2 2014数据集上实现了最高的“严格”微观平均F1分数，为95.50%，在CEGS N-GRID 2016数据集上为91.82%，分别成为这两个数据集上的新基准结果。去识别化、命名实体识别、双向长短期记忆、条件随机场、神经语言模型。

Suppr 超能文献

文献检索

文件翻译

深度研究

Suppr 超能文献

文献检索

文件翻译

深度研究

基于神经语言模型的双向长短时记忆条件随机场实现临床文本去识别化

De-identification of Clinical Text via Bi-LSTM-CRF with Neural Language Models.

作者信息

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献

基于神经语言模型的双向长短时记忆条件随机场实现临床文本去识别化

De-identification of Clinical Text via Bi-LSTM-CRF with Neural Language Models.

作者信息

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献