使用条件随机场和长短时记忆网络对病历进行去识别。

De-identification of medical records using conditional random fields and long short-term memory networks.

机构信息

School of Computer Science and Technology, Harbin Institute of Technology, Harbin, Heilongjiang 150001, China.

出版信息

J Biomed Inform. 2017 Nov;75S:S43-S53. doi: 10.1016/j.jbi.2017.10.003. Epub 2017 Oct 13.

DOI:10.1016/j.jbi.2017.10.003

PMID:29032162

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC5890009/

Abstract

The CEGS N-GRID 2016 Shared Task 1 in Clinical Natural Language Processing focuses on the de-identification of psychiatric evaluation records. This paper describes two participating systems of our team, based on conditional random fields (CRFs) and long short-term memory networks (LSTMs). A pre-processing module was introduced for sentence detection and tokenization before de-identification. For CRFs, manually extracted rich features were utilized to train the model. For LSTMs, a character-level bi-directional LSTM network was applied to represent tokens and classify tags for each token, following which a decoding layer was stacked to decode the most probable protected health information (PHI) terms. The LSTM-based system attained an i2b2 strict micro-F measure of 0.8986, which was higher than that of the CRF-based system.

摘要

CEGS N-GRID 2016 临床自然语言处理共享任务 1 专注于精神科评估记录的去识别化。本文描述了我们团队的两个参赛系统，基于条件随机场 (CRFs) 和长短时记忆网络 (LSTMs)。在去识别化之前，引入了一个预处理模块进行句子检测和标记。对于 CRFs，我们利用手动提取的丰富特征来训练模型。对于 LSTMs，我们应用了字符级别的双向 LSTM 网络来表示标记，并为每个标记分类标签，然后堆叠解码层来解码最可能的受保护健康信息 (PHI) 项。基于 LSTM 的系统在 i2b2 严格微观 F 度量上达到了 0.8986，高于基于 CRF 的系统。

相似文献

De-identification of medical records using conditional random fields and long short-term memory networks.使用条件随机场和长短时记忆网络对病历进行去识别。

J Biomed Inform. 2017 Nov;75S:S43-S53. doi: 10.1016/j.jbi.2017.10.003. Epub 2017 Oct 13.

De-identification of Clinical Text via Bi-LSTM-CRF with Neural Language Models.基于神经语言模型的双向长短时记忆条件随机场实现临床文本去识别化

AMIA Annu Symp Proc. 2020 Mar 4;2019:857-863. eCollection 2019.

De-identification of clinical notes via recurrent neural network and conditional random field.通过递归神经网络和条件随机场对临床记录进行去识别。

J Biomed Inform. 2017 Nov;75S:S34-S42. doi: 10.1016/j.jbi.2017.05.023. Epub 2017 Jun 1.

CRFs based de-identification of medical records.基于病例报告表的医疗记录去识别化处理。

J Biomed Inform. 2015 Dec;58 Suppl(Suppl):S39-S46. doi: 10.1016/j.jbi.2015.08.012. Epub 2015 Aug 24.

Automatic de-identification of electronic medical records using token-level and character-level conditional random fields.使用令牌级和字符级条件随机场对电子病历进行自动去识别。

J Biomed Inform. 2015 Dec;58 Suppl(Suppl):S47-S52. doi: 10.1016/j.jbi.2015.06.009. Epub 2015 Jun 26.

Entity recognition from clinical texts via recurrent neural network.基于循环神经网络的临床文本实体识别。

BMC Med Inform Decis Mak. 2017 Jul 5;17(Suppl 2):67. doi: 10.1186/s12911-017-0468-7.

De-identifying free text of Japanese electronic health records.去标识化日本电子健康记录的自由文本。

J Biomed Semantics. 2020 Sep 21;11(1):11. doi: 10.1186/s13326-020-00227-9.

Comparing information extraction techniques for low-prevalence concepts: The case of insulin rejection by patients.比较低患病率概念的信息提取技术：以患者拒绝胰岛素为例。

J Biomed Inform. 2019 Nov;99:103306. doi: 10.1016/j.jbi.2019.103306. Epub 2019 Oct 13.

An Empirical Test of GRUs and Deep Contextualized Word Representations on De-Identification.关于去识别化的门控循环单元（GRU）和深度语境化词表征的实证测试

Stud Health Technol Inform. 2019 Aug 21;264:218-222. doi: 10.3233/SHTI190215.

De-identification of psychiatric intake records: Overview of 2016 CEGS N-GRID shared tasks Track 1.去识别精神科入院记录：2016 年 CEGS N-GRID 共享任务跟踪 1 概述。

J Biomed Inform. 2017 Nov;75S:S4-S18. doi: 10.1016/j.jbi.2017.06.011. Epub 2017 Jun 11.

引用本文的文献

Evaluating GPT models for clinical note de-identification.评估用于临床记录去识别化的GPT模型。

Sci Rep. 2025 Jan 31;15(1):3852. doi: 10.1038/s41598-025-86890-3.

De-identification of free text data containing personal health information: a scoping review of reviews.去标识化包含个人健康信息的自由文本数据：综述的综述。

Int J Popul Data Sci. 2023 Dec 12;8(1):2153. doi: 10.23889/ijpds.v8i1.2153. eCollection 2023.

Web-Based Application Based on Human-in-the-Loop Deep Learning for Deidentifying Free-Text Data in Electronic Medical Records: Development and Usability Study.基于人在回路深度学习的电子病历自由文本数据去识别化的网络应用程序：开发与可用性研究

Interact J Med Res. 2023 Aug 25;12:e46322. doi: 10.2196/46322.

Publicly available machine learning models for identifying opioid misuse from the clinical notes of hospitalized patients.可公开获取的机器学习模型，用于从住院患者的临床记录中识别阿片类药物滥用。

BMC Med Inform Decis Mak. 2020 Apr 29;20(1):79. doi: 10.1186/s12911-020-1099-y.

A study of deep learning methods for de-identification of clinical notes in cross-institute settings.深度学习方法在跨机构环境下对临床记录进行去识别的研究。

BMC Med Inform Decis Mak. 2019 Dec 5;19(Suppl 5):232. doi: 10.1186/s12911-019-0935-4.

Advancing the State of the Art in Clinical Natural Language Processing through Shared Tasks.通过共享任务推动临床自然语言处理技术的发展。

Yearb Med Inform. 2018 Aug;27(1):184-192. doi: 10.1055/s-0038-1667079. Epub 2018 Aug 29.

A natural language processing challenge for clinical records: Research Domains Criteria (RDoC) for psychiatry.临床记录面临的自然语言处理挑战：精神病学的研究领域标准（RDoC）

J Biomed Inform. 2017 Nov;75S:S1-S3. doi: 10.1016/j.jbi.2017.10.005. Epub 2017 Oct 16.

De-identification of psychiatric intake records: Overview of 2016 CEGS N-GRID shared tasks Track 1.去识别精神科入院记录：2016 年 CEGS N-GRID 共享任务跟踪 1 概述。

J Biomed Inform. 2017 Nov;75S:S4-S18. doi: 10.1016/j.jbi.2017.06.011. Epub 2017 Jun 11.

本文引用的文献

De-identification of psychiatric intake records: Overview of 2016 CEGS N-GRID shared tasks Track 1.去识别精神科入院记录：2016 年 CEGS N-GRID 共享任务跟踪 1 概述。

J Biomed Inform. 2017 Nov;75S:S4-S18. doi: 10.1016/j.jbi.2017.06.011. Epub 2017 Jun 11.

De-identification of patient notes with recurrent neural networks.使用递归神经网络对患者记录进行去识别化处理。

J Am Med Inform Assoc. 2017 May 1;24(3):596-606. doi: 10.1093/jamia/ocw156.

LSTM: A Search Space Odyssey.长短期记忆网络：搜索空间奥德赛。

IEEE Trans Neural Netw Learn Syst. 2017 Oct;28(10):2222-2232. doi: 10.1109/TNNLS.2016.2582924. Epub 2016 Jul 8.

CRFs based de-identification of medical records.基于病例报告表的医疗记录去识别化处理。

J Biomed Inform. 2015 Dec;58 Suppl(Suppl):S39-S46. doi: 10.1016/j.jbi.2015.08.012. Epub 2015 Aug 24.

Automatic detection of protected health information from clinic narratives.从临床记录中自动检测受保护的健康信息。

J Biomed Inform. 2015 Dec;58 Suppl(Suppl):S30-S38. doi: 10.1016/j.jbi.2015.06.015. Epub 2015 Jul 29.

Automated systems for the de-identification of longitudinal clinical narratives: Overview of 2014 i2b2/UTHealth shared task Track 1.用于纵向临床记录去识别化的自动化系统：2014年i2b2/德克萨斯大学健康科学中心共享任务赛道1概述

J Biomed Inform. 2015 Dec;58 Suppl(Suppl):S11-S19. doi: 10.1016/j.jbi.2015.06.007. Epub 2015 Jul 28.

Deep learning.深度学习。

Nature. 2015 May 28;521(7553):436-44. doi: 10.1038/nature14539.

Can physicians recognize their own patients in de-identified notes?医生能从去识别化的记录中认出自己的患者吗？

Stud Health Technol Inform. 2014;205:778-82.

Representation learning: a review and new perspectives.表示学习：综述与新视角。

IEEE Trans Pattern Anal Mach Intell. 2013 Aug;35(8):1798-828. doi: 10.1109/TPAMI.2013.50.

Learning long-term dependencies with gradient descent is difficult.使用梯度下降法学习长期依赖关系是困难的。

IEEE Trans Neural Netw. 1994;5(2):157-66. doi: 10.1109/72.279181.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

使用条件随机场和长短时记忆网络对病历进行去识别。

De-identification of medical records using conditional random fields and long short-term memory networks.

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献

文献检索

文件翻译

深度研究

Suppr 超能文献