去识别精神科入院记录：2016 年 CEGS N-GRID 共享任务跟踪 1 概述。

De-identification of psychiatric intake records: Overview of 2016 CEGS N-GRID shared tasks Track 1.

机构信息

Simmons College, School of Library and Information Science, 300 The Fenway, Boston, MA 02115, United States.

University at Albany, United States.

出版信息

J Biomed Inform. 2017 Nov;75S:S4-S18. doi: 10.1016/j.jbi.2017.06.011. Epub 2017 Jun 11.

DOI:10.1016/j.jbi.2017.06.011

PMID:28614702

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC5705537/

Abstract

The 2016 CEGS N-GRID shared tasks for clinical records contained three tracks. Track 1 focused on de-identification of a new corpus of 1000 psychiatric intake records. This track tackled de-identification in two sub-tracks: Track 1.A was a "sight unseen" task, where nine teams ran existing de-identification systems, without any modifications or training, on 600 new records in order to gauge how well systems generalize to new data. The best-performing system for this track scored an F1 of 0.799. Track 1.B was a traditional Natural Language Processing (NLP) shared task on de-identification, where 15 teams had two months to train their systems on the new data, then test it on an unannotated test set. The best-performing system from this track scored an F1 of 0.914. The scores for Track 1.A show that unmodified existing systems do not generalize well to new data without the benefit of training data. The scores for Track 1.B are slightly lower than the 2014 de-identification shared task (which was almost identical to 2016 Track 1.B), indicating that these new psychiatric records pose a more difficult challenge to NLP systems. Overall, de-identification is still not a solved problem, though it is important to the future of clinical NLP.

摘要

2016CEGS N-GRID 临床记录共享任务包含三个轨道。轨道 1 专注于对 1000 份新的精神病入院记录进行去识别处理。该轨道在两个子轨道中处理去识别：轨道 1.A 是一个“不见样本”任务，有九个团队在 600 份新记录上运行现有的去识别系统，而无需进行任何修改或培训，以评估系统对新数据的泛化能力。在该轨道上表现最好的系统的 F1 得分为 0.799。轨道 1.B 是一个关于去识别的传统自然语言处理（NLP）共享任务，有 15 个团队有两个月的时间在新数据上训练他们的系统，然后在未注释的测试集上进行测试。来自该轨道的表现最好的系统的 F1 得分为 0.914。轨道 1.A 的分数表明，未经修改的现有系统在没有训练数据的情况下，无法很好地泛化到新数据。轨道 1.B 的分数略低于 2014 年的去识别共享任务（与 2016 年轨道 1.B 几乎相同），这表明这些新的精神病记录对 NLP 系统构成了更大的挑战。总的来说，去识别仍然是一个未解决的问题，尽管它对临床 NLP 的未来很重要。

相似文献

De-identification of psychiatric intake records: Overview of 2016 CEGS N-GRID shared tasks Track 1.去识别精神科入院记录：2016 年 CEGS N-GRID 共享任务跟踪 1 概述。

J Biomed Inform. 2017 Nov;75S:S4-S18. doi: 10.1016/j.jbi.2017.06.011. Epub 2017 Jun 11.

Symptom severity prediction from neuropsychiatric clinical records: Overview of 2016 CEGS N-GRID shared tasks Track 2.从神经精神临床记录中预测症状严重程度：2016 年 CEGS N-GRID 共享任务第 2 轨道概述。

J Biomed Inform. 2017 Nov;75S:S62-S70. doi: 10.1016/j.jbi.2017.04.017. Epub 2017 Apr 25.

Automated systems for the de-identification of longitudinal clinical narratives: Overview of 2014 i2b2/UTHealth shared task Track 1.用于纵向临床记录去识别化的自动化系统：2014年i2b2/德克萨斯大学健康科学中心共享任务赛道1概述

J Biomed Inform. 2015 Dec;58 Suppl(Suppl):S11-S19. doi: 10.1016/j.jbi.2015.06.007. Epub 2015 Jul 28.

The UAB Informatics Institute and 2016 CEGS N-GRID de-identification shared task challenge.UAB 信息学研究所和 2016 年 CEGS N-GRID 去识别共享任务挑战赛。

J Biomed Inform. 2017 Nov;75S:S54-S61. doi: 10.1016/j.jbi.2017.05.001. Epub 2017 May 3.

A hybrid approach to automatic de-identification of psychiatric notes.一种混合方法，用于自动识别精神科病历中的身份信息。

J Biomed Inform. 2017 Nov;75S:S19-S27. doi: 10.1016/j.jbi.2017.06.006. Epub 2017 Jun 7.

De-identification of clinical notes via recurrent neural network and conditional random field.通过递归神经网络和条件随机场对临床记录进行去识别。

J Biomed Inform. 2017 Nov;75S:S34-S42. doi: 10.1016/j.jbi.2017.05.023. Epub 2017 Jun 1.

De-identification of medical records using conditional random fields and long short-term memory networks.使用条件随机场和长短时记忆网络对病历进行去识别。

J Biomed Inform. 2017 Nov;75S:S43-S53. doi: 10.1016/j.jbi.2017.10.003. Epub 2017 Oct 13.

De-identification of Clinical Text via Bi-LSTM-CRF with Neural Language Models.基于神经语言模型的双向长短时记忆条件随机场实现临床文本去识别化

AMIA Annu Symp Proc. 2020 Mar 4;2019:857-863. eCollection 2019.

An Empirical Test of GRUs and Deep Contextualized Word Representations on De-Identification.关于去识别化的门控循环单元（GRU）和深度语境化词表征的实证测试

Stud Health Technol Inform. 2019 Aug 21;264:218-222. doi: 10.3233/SHTI190215.

Cohort selection for clinical trials: n2c2 2018 shared task track 1.队列选择用于临床试验：n2c2 2018 共享任务赛道 1。

J Am Med Inform Assoc. 2019 Nov 1;26(11):1163-1171. doi: 10.1093/jamia/ocz163.

引用本文的文献

Leveraging large language models for the deidentification and temporal normalization of sensitive health information in electronic health records.利用大语言模型对电子健康记录中的敏感健康信息进行去识别化处理和时间标准化。

NPJ Digit Med. 2025 Aug 13;8(1):517. doi: 10.1038/s41746-025-01921-7.

An Extensible Evaluation Framework Applied to Clinical Text Deidentification Natural Language Processing Tools: Multisystem and Multicorpus Study.应用于临床文本去标识化自然语言处理工具的可扩展评估框架：多系统和多语料库研究。

J Med Internet Res. 2024 May 28;26:e55676. doi: 10.2196/55676.

De-identification of free text data containing personal health information: a scoping review of reviews.去标识化包含个人健康信息的自由文本数据：综述的综述。

Int J Popul Data Sci. 2023 Dec 12;8(1):2153. doi: 10.23889/ijpds.v8i1.2153. eCollection 2023.

Unlocking the Secrets Behind Advanced Artificial Intelligence Language Models in Deidentifying Chinese-English Mixed Clinical Text: Development and Validation Study.揭开高级人工智能语言模型在去识别汉英混合临床文本背后的秘密：开发与验证研究。

J Med Internet Res. 2024 Jan 25;26:e48443. doi: 10.2196/48443.

OpenDeID Pipeline for Unstructured Electronic Health Record Text Notes Based on Rules and Transformers: Deidentification Algorithm Development and Validation Study.基于规则和转换器的非结构化电子健康记录文本注释的 OpenDeID 管道：去识别算法的开发和验证研究。

J Med Internet Res. 2023 Dec 6;25:e48145. doi: 10.2196/48145.

Web-Based Application Based on Human-in-the-Loop Deep Learning for Deidentifying Free-Text Data in Electronic Medical Records: Development and Usability Study.基于人在回路深度学习的电子病历自由文本数据去识别化的网络应用程序：开发与可用性研究

Interact J Med Res. 2023 Aug 25;12:e46322. doi: 10.2196/46322.

Privacy-Preserving Deep Learning NLP Models for Cancer Registries.用于癌症登记处的隐私保护深度学习自然语言处理模型。

IEEE Trans Emerg Top Comput. 2021 Jul-Sep;9(3):1219-1230. doi: 10.1109/tetc.2020.2983404. Epub 2020 Apr 16.

An Efficient Method for Deidentifying Protected Health Information in Chinese Electronic Health Records: Algorithm Development and Validation.一种在中国电子健康记录中去识别受保护健康信息的有效方法：算法开发与验证

JMIR Med Inform. 2022 Aug 30;10(8):e38154. doi: 10.2196/38154.

A scoping review of publicly available language tasks in clinical natural language processing.临床自然语言处理中公开可用语言任务的范围综述

J Am Med Inform Assoc. 2022 Sep 12;29(10):1797-1806. doi: 10.1093/jamia/ocac127.

Reconciling Allergy Information in the Electronic Health Record After a Drug Challenge Using Natural Language Processing.使用自然语言处理技术在药物激发试验后核对电子健康记录中的过敏信息。

Front Allergy. 2022 May 10;3:904923. doi: 10.3389/falgy.2022.904923. eCollection 2022.

本文引用的文献

De-identification of medical records using conditional random fields and long short-term memory networks.使用条件随机场和长短时记忆网络对病历进行去识别。

J Biomed Inform. 2017 Nov;75S:S43-S53. doi: 10.1016/j.jbi.2017.10.003. Epub 2017 Oct 13.

Learning to identify Protected Health Information by integrating knowledge- and data-driven algorithms: A case study on psychiatric evaluation notes.通过整合知识和数据驱动算法来学习识别受保护的健康信息：一项关于精神科评估记录的案例研究。

J Biomed Inform. 2017 Nov;75S:S28-S33. doi: 10.1016/j.jbi.2017.06.005. Epub 2017 Jun 7.

A hybrid approach to automatic de-identification of psychiatric notes.一种混合方法，用于自动识别精神科病历中的身份信息。

J Biomed Inform. 2017 Nov;75S:S19-S27. doi: 10.1016/j.jbi.2017.06.006. Epub 2017 Jun 7.

De-identification of clinical notes via recurrent neural network and conditional random field.通过递归神经网络和条件随机场对临床记录进行去识别。

J Biomed Inform. 2017 Nov;75S:S34-S42. doi: 10.1016/j.jbi.2017.05.023. Epub 2017 Jun 1.

The UAB Informatics Institute and 2016 CEGS N-GRID de-identification shared task challenge.UAB 信息学研究所和 2016 年 CEGS N-GRID 去识别共享任务挑战赛。

J Biomed Inform. 2017 Nov;75S:S54-S61. doi: 10.1016/j.jbi.2017.05.001. Epub 2017 May 3.

De-identification of patient notes with recurrent neural networks.使用递归神经网络对患者记录进行去识别化处理。

J Am Med Inform Assoc. 2017 May 1;24(3):596-606. doi: 10.1093/jamia/ocw156.

Creation of a new longitudinal corpus of clinical narratives.创建一个新的临床叙事纵向语料库。

J Biomed Inform. 2015 Dec;58 Suppl(Suppl):S6-S10. doi: 10.1016/j.jbi.2015.09.018. Epub 2015 Oct 1.

Annotating longitudinal clinical narratives for de-identification: The 2014 i2b2/UTHealth corpus.用于去识别化的纵向临床记录标注：2014年i2b2/德克萨斯大学健康科学中心语料库

J Biomed Inform. 2015 Dec;58 Suppl(Suppl):S20-S29. doi: 10.1016/j.jbi.2015.07.020. Epub 2015 Aug 28.

J Biomed Inform. 2015 Dec;58 Suppl(Suppl):S11-S19. doi: 10.1016/j.jbi.2015.06.007. Epub 2015 Jul 28.

Combining knowledge- and data-driven methods for de-identification of clinical narratives.结合知识驱动和数据驱动方法对临床记录进行去识别化处理。

J Biomed Inform. 2015 Dec;58 Suppl(Suppl):S53-S59. doi: 10.1016/j.jbi.2015.06.029. Epub 2015 Jul 22.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验