跨机构环境下临床记录去识别化的深度学习方法研究

A Study of Deep Learning Methods for De-identification of Clinical Notes at Cross Institute Settings.

作者信息

Yang Xi, Lyu Tianchen, Lee Chih-Yin, Bian Jiang, Hogan William R, Wu Yonghui

机构信息

Health Outcomes and Biomedical Informatics College of Medicine, University of Florida Gainesville, USA.

出版信息

Proc (IEEE Int Conf Healthc Inform). 2019 Jun;2019. doi: 10.1109/ICHI.2019.8904544. Epub 2019 Nov 21.

DOI:10.1109/ICHI.2019.8904544

PMID:31879734

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC6932867/

Abstract

In this study, we examined a deep learning method for de-identification of clinical notes at UF Health under a cross-institute setting. We developed deep learning models using 2014 i2b2/UTHealth corpus and evaluated the performance using clinical notes collected from UF Health. We compared four pre-trained word embeddings, including two embeddings from the general domain and two embeddings from the clinical domain. We also explored linguistic features (i.e., word shape and part-of-speech) to further improve the performance of de-identification. The experimental results show that the performance of deep learning models trained using i2b2/UTHealth corpus significantly dropped (strict and relax F1 scores dropped from 0.9547 and 0.9646 to 0.8360 and 0.8870) when applied to another corpus from a different institution (UF Health). Linguistic features, including word shapes and part-of-speech, could further improve the performance of de-identification in cross-institute settings (improved to 0.8527 and 0.9052).

摘要

在本研究中，我们考察了一种深度学习方法，用于在跨机构环境下对佛罗里达大学健康中心（UF Health）的临床记录进行去识别处理。我们使用2014年i2b2/德克萨斯大学健康科学中心（UTHealth）语料库开发了深度学习模型，并使用从UF Health收集的临床记录评估其性能。我们比较了四种预训练词嵌入，包括两种通用领域的嵌入和两种临床领域的嵌入。我们还探索了语言特征（即词形和词性）以进一步提高去识别性能。实验结果表明，当将使用i2b2/UTHealth语料库训练的深度学习模型应用于来自不同机构（UF Health）的另一个语料库时，其性能显著下降（严格和宽松F1分数分别从0.9547和0.9646降至0.8360和0.8870）。包括词形和词性在内的语言特征可以在跨机构环境中进一步提高去识别性能（提高到0.8527和0.9052）。

相似文献

A Study of Deep Learning Methods for De-identification of Clinical Notes at Cross Institute Settings.

Proc (IEEE Int Conf Healthc Inform). 2019 Jun;2019. doi: 10.1109/ICHI.2019.8904544. Epub 2019 Nov 21.

A study of deep learning methods for de-identification of clinical notes in cross-institute settings.

BMC Med Inform Decis Mak. 2019 Dec 5;19(Suppl 5):232. doi: 10.1186/s12911-019-0935-4.

DeIDNER Model: A Neural Network Named Entity Recognition Model for Use in the De-identification of Clinical Notes.

Biomed Eng Syst Technol Int Jt Conf BIOSTEC Revis Sel Pap. 2022 Feb;5:640-647. doi: 10.5220/0010884500003123.

A comparison of word embeddings for the biomedical natural language processing.

J Biomed Inform. 2018 Nov;87:12-20. doi: 10.1016/j.jbi.2018.09.008. Epub 2018 Sep 12.

A Hybrid Model for Family History Information Identification and Relation Extraction: Development and Evaluation of an End-to-End Information Extraction System.

JMIR Med Inform. 2021 Apr 22;9(4):e22797. doi: 10.2196/22797.

Word embeddings trained on published case reports are lightweight, effective for clinical tasks, and free of protected health information.

J Biomed Inform. 2022 Jan;125:103971. doi: 10.1016/j.jbi.2021.103971. Epub 2021 Dec 14.

Investigating the impact of pre-processing techniques and pre-trained word embeddings in detecting Arabic health information on social media.

J Big Data. 2021;8(1):95. doi: 10.1186/s40537-021-00488-w. Epub 2021 Jul 2.

Combine Factual Medical Knowledge and Distributed Word Representation to Improve Clinical Named Entity Recognition.

AMIA Annu Symp Proc. 2018 Dec 5;2018:1110-1117. eCollection 2018.

Deep Learning Approaches for Predicting Glaucoma Progression Using Electronic Health Records and Natural Language Processing.

Ophthalmol Sci. 2022 Feb 12;2(2):100127. doi: 10.1016/j.xops.2022.100127. eCollection 2022 Jun.

Annotating longitudinal clinical narratives for de-identification: The 2014 i2b2/UTHealth corpus.

J Biomed Inform. 2015 Dec;58 Suppl(Suppl):S20-S29. doi: 10.1016/j.jbi.2015.07.020. Epub 2015 Aug 28.

本文引用的文献

Combine Factual Medical Knowledge and Distributed Word Representation to Improve Clinical Named Entity Recognition.

AMIA Annu Symp Proc. 2018 Dec 5;2018:1110-1117. eCollection 2018.

De-identification of psychiatric intake records: Overview of 2016 CEGS N-GRID shared tasks Track 1.

J Biomed Inform. 2017 Nov;75S:S4-S18. doi: 10.1016/j.jbi.2017.06.011. Epub 2017 Jun 11.

MIMIC-III, a freely accessible critical care database.

Sci Data. 2016 May 24;3:160035. doi: 10.1038/sdata.2016.35.

Automated systems for the de-identification of longitudinal clinical narratives: Overview of 2014 i2b2/UTHealth shared task Track 1.

J Biomed Inform. 2015 Dec;58 Suppl(Suppl):S11-S19. doi: 10.1016/j.jbi.2015.06.007. Epub 2015 Jul 28.

Natural language processing: an introduction.

J Am Med Inform Assoc. 2011 Sep-Oct;18(5):544-51. doi: 10.1136/amiajnl-2011-000464.

Automatic de-identification of textual documents in the electronic health record: a review of recent research.

BMC Med Res Methodol. 2010 Aug 2;10:70. doi: 10.1186/1471-2288-10-70.

Evaluating the state-of-the-art in automatic de-identification.

J Am Med Inform Assoc. 2007 Sep-Oct;14(5):550-63. doi: 10.1197/jamia.M2444. Epub 2007 Jun 28.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

跨机构环境下临床记录去识别化的深度学习方法研究

A Study of Deep Learning Methods for De-identification of Clinical Notes at Cross Institute Settings.

作者信息

机构信息

出版信息

相似文献

本文引用的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献