Suppr超能文献

跨机构环境下临床记录去识别化的深度学习方法研究

A Study of Deep Learning Methods for De-identification of Clinical Notes at Cross Institute Settings.

作者信息

Yang Xi, Lyu Tianchen, Lee Chih-Yin, Bian Jiang, Hogan William R, Wu Yonghui

机构信息

Health Outcomes and Biomedical Informatics College of Medicine, University of Florida Gainesville, USA.

出版信息

Proc (IEEE Int Conf Healthc Inform). 2019 Jun;2019. doi: 10.1109/ICHI.2019.8904544. Epub 2019 Nov 21.

Abstract

In this study, we examined a deep learning method for de-identification of clinical notes at UF Health under a cross-institute setting. We developed deep learning models using 2014 i2b2/UTHealth corpus and evaluated the performance using clinical notes collected from UF Health. We compared four pre-trained word embeddings, including two embeddings from the general domain and two embeddings from the clinical domain. We also explored linguistic features (i.e., word shape and part-of-speech) to further improve the performance of de-identification. The experimental results show that the performance of deep learning models trained using i2b2/UTHealth corpus significantly dropped (strict and relax F1 scores dropped from 0.9547 and 0.9646 to 0.8360 and 0.8870) when applied to another corpus from a different institution (UF Health). Linguistic features, including word shapes and part-of-speech, could further improve the performance of de-identification in cross-institute settings (improved to 0.8527 and 0.9052).

摘要

在本研究中,我们考察了一种深度学习方法,用于在跨机构环境下对佛罗里达大学健康中心(UF Health)的临床记录进行去识别处理。我们使用2014年i2b2/德克萨斯大学健康科学中心(UTHealth)语料库开发了深度学习模型,并使用从UF Health收集的临床记录评估其性能。我们比较了四种预训练词嵌入,包括两种通用领域的嵌入和两种临床领域的嵌入。我们还探索了语言特征(即词形和词性)以进一步提高去识别性能。实验结果表明,当将使用i2b2/UTHealth语料库训练的深度学习模型应用于来自不同机构(UF Health)的另一个语料库时,其性能显著下降(严格和宽松F1分数分别从0.9547和0.9646降至0.8360和0.8870)。包括词形和词性在内的语言特征可以在跨机构环境中进一步提高去识别性能(提高到0.8527和0.9052)。

相似文献

1
A Study of Deep Learning Methods for De-identification of Clinical Notes at Cross Institute Settings.
Proc (IEEE Int Conf Healthc Inform). 2019 Jun;2019. doi: 10.1109/ICHI.2019.8904544. Epub 2019 Nov 21.
2
A study of deep learning methods for de-identification of clinical notes in cross-institute settings.
BMC Med Inform Decis Mak. 2019 Dec 5;19(Suppl 5):232. doi: 10.1186/s12911-019-0935-4.
3
DeIDNER Model: A Neural Network Named Entity Recognition Model for Use in the De-identification of Clinical Notes.
Biomed Eng Syst Technol Int Jt Conf BIOSTEC Revis Sel Pap. 2022 Feb;5:640-647. doi: 10.5220/0010884500003123.
4
A comparison of word embeddings for the biomedical natural language processing.
J Biomed Inform. 2018 Nov;87:12-20. doi: 10.1016/j.jbi.2018.09.008. Epub 2018 Sep 12.
9
Deep Learning Approaches for Predicting Glaucoma Progression Using Electronic Health Records and Natural Language Processing.
Ophthalmol Sci. 2022 Feb 12;2(2):100127. doi: 10.1016/j.xops.2022.100127. eCollection 2022 Jun.
10
Annotating longitudinal clinical narratives for de-identification: The 2014 i2b2/UTHealth corpus.
J Biomed Inform. 2015 Dec;58 Suppl(Suppl):S20-S29. doi: 10.1016/j.jbi.2015.07.020. Epub 2015 Aug 28.

本文引用的文献

2
De-identification of psychiatric intake records: Overview of 2016 CEGS N-GRID shared tasks Track 1.
J Biomed Inform. 2017 Nov;75S:S4-S18. doi: 10.1016/j.jbi.2017.06.011. Epub 2017 Jun 11.
3
MIMIC-III, a freely accessible critical care database.
Sci Data. 2016 May 24;3:160035. doi: 10.1038/sdata.2016.35.
4
Automated systems for the de-identification of longitudinal clinical narratives: Overview of 2014 i2b2/UTHealth shared task Track 1.
J Biomed Inform. 2015 Dec;58 Suppl(Suppl):S11-S19. doi: 10.1016/j.jbi.2015.06.007. Epub 2015 Jul 28.
5
Natural language processing: an introduction.
J Am Med Inform Assoc. 2011 Sep-Oct;18(5):544-51. doi: 10.1136/amiajnl-2011-000464.
7
Evaluating the state-of-the-art in automatic de-identification.
J Am Med Inform Assoc. 2007 Sep-Oct;14(5):550-63. doi: 10.1197/jamia.M2444. Epub 2007 Jun 28.

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验