用于纵向临床记录去识别化的自动化系统：2014年i2b2/德克萨斯大学健康科学中心共享任务赛道1概述

Automated systems for the de-identification of longitudinal clinical narratives: Overview of 2014 i2b2/UTHealth shared task Track 1.

作者信息

Stubbs Amber, Kotfila Christopher, Uzuner Özlem

机构信息

School of Library and Information Science, Simmons College, Boston, MA, USA.

Department of Information Studies, State University of New York at Albany, Albany, NY, USA.

出版信息

J Biomed Inform. 2015 Dec;58 Suppl(Suppl):S11-S19. doi: 10.1016/j.jbi.2015.06.007. Epub 2015 Jul 28.

DOI:10.1016/j.jbi.2015.06.007

PMID:26225918

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC4989908/

Abstract

The 2014 i2b2/UTHealth Natural Language Processing (NLP) shared task featured four tracks. The first of these was the de-identification track focused on identifying protected health information (PHI) in longitudinal clinical narratives. The longitudinal nature of clinical narratives calls particular attention to details of information that, while benign on their own in separate records, can lead to identification of patients in combination in longitudinal records. Accordingly, the 2014 de-identification track addressed a broader set of entities and PHI than covered by the Health Insurance Portability and Accountability Act - the focus of the de-identification shared task that was organized in 2006. Ten teams tackled the 2014 de-identification task and submitted 22 system outputs for evaluation. Each team was evaluated on their best performing system output. Three of the 10 systems achieved F1 scores over .90, and seven of the top 10 scored over .75. The most successful systems combined conditional random fields and hand-written rules. Our findings indicate that automated systems can be very effective for this task, but that de-identification is not yet a solved problem.

摘要

2014年i2b2/德克萨斯大学健康科学中心自然语言处理（NLP）共享任务有四个赛道。其中第一个是去识别赛道，专注于在纵向临床叙述中识别受保护的健康信息（PHI）。临床叙述的纵向性质特别关注信息细节，这些细节虽然在单独记录中本身无害，但在纵向记录中组合起来可能导致患者被识别。因此，2014年去识别赛道处理的实体和PHI比《健康保险流通与责任法案》涵盖的范围更广——2006年组织的去识别共享任务的重点。十个团队参与了2014年去识别任务并提交了22个系统输出进行评估。每个团队根据其表现最佳的系统输出进行评估。十个系统中有三个的F1分数超过0.90，排名前十的系统中有七个得分超过0.75。最成功的系统结合了条件随机场和手写规则。我们的研究结果表明，自动化系统对于这项任务可能非常有效，但去识别尚未成为一个已解决的问题。

相似文献

Automated systems for the de-identification of longitudinal clinical narratives: Overview of 2014 i2b2/UTHealth shared task Track 1.

J Biomed Inform. 2015 Dec;58 Suppl(Suppl):S11-S19. doi: 10.1016/j.jbi.2015.06.007. Epub 2015 Jul 28.

Annotating longitudinal clinical narratives for de-identification: The 2014 i2b2/UTHealth corpus.

J Biomed Inform. 2015 Dec;58 Suppl(Suppl):S20-S29. doi: 10.1016/j.jbi.2015.07.020. Epub 2015 Aug 28.

Identifying risk factors for heart disease over time: Overview of 2014 i2b2/UTHealth shared task Track 2.

J Biomed Inform. 2015 Dec;58 Suppl(Suppl):S67-S77. doi: 10.1016/j.jbi.2015.07.001. Epub 2015 Jul 22.

CRFs based de-identification of medical records.

J Biomed Inform. 2015 Dec;58 Suppl(Suppl):S39-S46. doi: 10.1016/j.jbi.2015.08.012. Epub 2015 Aug 24.

Automatic de-identification of electronic medical records using token-level and character-level conditional random fields.

J Biomed Inform. 2015 Dec;58 Suppl(Suppl):S47-S52. doi: 10.1016/j.jbi.2015.06.009. Epub 2015 Jun 26.

Combining knowledge- and data-driven methods for de-identification of clinical narratives.

J Biomed Inform. 2015 Dec;58 Suppl(Suppl):S53-S59. doi: 10.1016/j.jbi.2015.06.029. Epub 2015 Jul 22.

Automatic detection of protected health information from clinic narratives.

J Biomed Inform. 2015 Dec;58 Suppl(Suppl):S30-S38. doi: 10.1016/j.jbi.2015.06.015. Epub 2015 Jul 29.

Annotating risk factors for heart disease in clinical narratives for diabetic patients.

J Biomed Inform. 2015 Dec;58 Suppl(Suppl):S78-S91. doi: 10.1016/j.jbi.2015.05.009. Epub 2015 May 21.

Combining glass box and black box evaluations in the identification of heart disease risk factors and their temporal relations from clinical records.

J Biomed Inform. 2015 Dec;58 Suppl(Suppl):S133-S142. doi: 10.1016/j.jbi.2015.06.014. Epub 2015 Jul 2.

Creation of a new longitudinal corpus of clinical narratives.

J Biomed Inform. 2015 Dec;58 Suppl(Suppl):S6-S10. doi: 10.1016/j.jbi.2015.09.018. Epub 2015 Oct 1.

引用本文的文献

Leveraging large language models for the deidentification and temporal normalization of sensitive health information in electronic health records.

NPJ Digit Med. 2025 Aug 13;8(1):517. doi: 10.1038/s41746-025-01921-7.

A textual dataset of de-identified health records in Spanish and Catalan for medical entity recognition and anonymization.

Sci Data. 2025 Jul 1;12(1):1088. doi: 10.1038/s41597-025-05320-1.

Hierarchical embedding attention for overall survival prediction in lung cancer from unstructured EHRs.

BMC Med Inform Decis Mak. 2025 Apr 18;25(1):169. doi: 10.1186/s12911-025-02998-6.

Large Language Model-Based Assessment of Clinical Reasoning Documentation in the Electronic Health Record Across Two Institutions: Development and Validation Study.

J Med Internet Res. 2025 Mar 21;27:e67967. doi: 10.2196/67967.

LLM-IE: a python package for biomedical generative information extraction with large language models.

JAMIA Open. 2025 Mar 12;8(2):ooaf012. doi: 10.1093/jamiaopen/ooaf012. eCollection 2025 Apr.

Leveraging large language models for knowledge-free weak supervision in clinical natural language processing.

Sci Rep. 2025 Mar 10;15(1):8241. doi: 10.1038/s41598-024-68168-2.

Automated redaction of names in adverse event reports using transformer-based neural networks.

BMC Med Inform Decis Mak. 2024 Dec 23;24(1):401. doi: 10.1186/s12911-024-02785-9.

Lightweight transformers for clinical natural language processing.

Nat Lang Eng. 2024 Sep;30(5):887-914. doi: 10.1017/S1351324923000542. Epub 2024 Jan 12.

Leveraging Large Language Models for Knowledge-free Weak Supervision in Clinical Natural Language Processing.

Res Sq. 2024 Jun 28:rs.3.rs-4559971. doi: 10.21203/rs.3.rs-4559971/v1.

Transformers and large language models in healthcare: A review.

Artif Intell Med. 2024 Aug;154:102900. doi: 10.1016/j.artmed.2024.102900. Epub 2024 Jun 5.

本文引用的文献

Creation of a new longitudinal corpus of clinical narratives.

J Biomed Inform. 2015 Dec;58 Suppl(Suppl):S6-S10. doi: 10.1016/j.jbi.2015.09.018. Epub 2015 Oct 1.

Hidden Markov model using Dirichlet process for de-identification.

J Biomed Inform. 2015 Dec;58 Suppl(Suppl):S60-S66. doi: 10.1016/j.jbi.2015.09.004. Epub 2015 Sep 25.

CRFs based de-identification of medical records.

J Biomed Inform. 2015 Dec;58 Suppl(Suppl):S39-S46. doi: 10.1016/j.jbi.2015.08.012. Epub 2015 Aug 24.

Risk factor detection for heart disease by applying text analytics in electronic medical records.

J Biomed Inform. 2015 Dec;58 Suppl(Suppl):S164-S170. doi: 10.1016/j.jbi.2015.08.011. Epub 2015 Aug 14.

Identifying risk factors for heart disease over time: Overview of 2014 i2b2/UTHealth shared task Track 2.

J Biomed Inform. 2015 Dec;58 Suppl(Suppl):S67-S77. doi: 10.1016/j.jbi.2015.07.001. Epub 2015 Jul 22.

Combining knowledge- and data-driven methods for de-identification of clinical narratives.

J Biomed Inform. 2015 Dec;58 Suppl(Suppl):S53-S59. doi: 10.1016/j.jbi.2015.06.029. Epub 2015 Jul 22.

Preparing an annotated gold standard corpus to share with extramural investigators for de-identification research.

J Biomed Inform. 2014 Aug;50:173-183. doi: 10.1016/j.jbi.2014.01.014. Epub 2014 Feb 17.

BoB, a best-of-breed automated text de-identification system for VHA clinical documents.

J Am Med Inform Assoc. 2013 Jan 1;20(1):77-83. doi: 10.1136/amiajnl-2012-001020. Epub 2012 Sep 4.

Large-scale evaluation of automated clinical note de-identification and its impact on information extraction.

J Am Med Inform Assoc. 2013 Jan 1;20(1):84-94. doi: 10.1136/amiajnl-2012-001012. Epub 2012 Aug 2.

Overview of the ID, EPI and REL tasks of BioNLP Shared Task 2011.

BMC Bioinformatics. 2012 Jun 26;13 Suppl 11(Suppl 11):S2. doi: 10.1186/1471-2105-13-S11-S2.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

用于纵向临床记录去识别化的自动化系统：2014年i2b2/德克萨斯大学健康科学中心共享任务赛道1概述

Automated systems for the de-identification of longitudinal clinical narratives: Overview of 2014 i2b2/UTHealth shared task Track 1.

作者信息

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献