Suppr超能文献

用于去识别化的纵向临床记录标注:2014年i2b2/德克萨斯大学健康科学中心语料库

Annotating longitudinal clinical narratives for de-identification: The 2014 i2b2/UTHealth corpus.

作者信息

Stubbs Amber, Uzuner Özlem

机构信息

School of Library and Information Science, Simmons College, Boston, MA, USA.

Department of Information Studies, State University of New York at Albany, Albany, NY, USA.

出版信息

J Biomed Inform. 2015 Dec;58 Suppl(Suppl):S20-S29. doi: 10.1016/j.jbi.2015.07.020. Epub 2015 Aug 28.

Abstract

The 2014 i2b2/UTHealth natural language processing shared task featured a track focused on the de-identification of longitudinal medical records. For this track, we de-identified a set of 1304 longitudinal medical records describing 296 patients. This corpus was de-identified under a broad interpretation of the HIPAA guidelines using double-annotation followed by arbitration, rounds of sanity checking, and proof reading. The average token-based F1 measure for the annotators compared to the gold standard was 0.927. The resulting annotations were used both to de-identify the data and to set the gold standard for the de-identification track of the 2014 i2b2/UTHealth shared task. All annotated private health information were replaced with realistic surrogates automatically and then read over and corrected manually. The resulting corpus is the first of its kind made available for de-identification research. This corpus was first used for the 2014 i2b2/UTHealth shared task, during which the systems achieved a mean F-measure of 0.872 and a maximum F-measure of 0.964 using entity-based micro-averaged evaluations.

摘要

2014年i2b2/德克萨斯大学健康科学中心自然语言处理共享任务中有一个专注于纵向医疗记录去识别化的赛道。针对这个赛道,我们对一组描述296名患者的1304份纵向医疗记录进行了去识别化处理。该语料库是根据对《健康保险流通与责任法案》(HIPAA)指南的宽泛解释进行去识别化的,采用了双重标注,随后进行仲裁、多轮合理性检查和校对。与金标准相比,注释者基于token的平均F1值为0.927。所得注释既用于对数据进行去识别化,也用于为2014年i2b2/德克萨斯大学健康科学中心共享任务的去识别化赛道设定金标准。所有带注释的私人健康信息都自动替换为逼真的替代物,然后进行人工审阅和修正。所得语料库是首个可用于去识别化研究的此类语料库。该语料库首次用于2014年i2b2/德克萨斯大学健康科学中心共享任务,在此期间,各系统使用基于实体的微观平均评估方法,平均F值达到0.872,最大F值达到0.964。

相似文献

4
CRFs based de-identification of medical records.基于病例报告表的医疗记录去识别化处理。
J Biomed Inform. 2015 Dec;58 Suppl(Suppl):S39-S46. doi: 10.1016/j.jbi.2015.08.012. Epub 2015 Aug 24.

引用本文的文献

9
Transformers and large language models in healthcare: A review.医疗保健中的变压器和大型语言模型:综述。
Artif Intell Med. 2024 Aug;154:102900. doi: 10.1016/j.artmed.2024.102900. Epub 2024 Jun 5.

本文引用的文献

1
Creation of a new longitudinal corpus of clinical narratives.创建一个新的临床叙事纵向语料库。
J Biomed Inform. 2015 Dec;58 Suppl(Suppl):S6-S10. doi: 10.1016/j.jbi.2015.09.018. Epub 2015 Oct 1.
8
EliXR: an approach to eligibility criteria extraction and representation.EliXR:一种资格标准提取和表示方法。
J Am Med Inform Assoc. 2011 Dec;18 Suppl 1(Suppl 1):i116-24. doi: 10.1136/amiajnl-2011-000321. Epub 2011 Jul 31.
9
What can natural language processing do for clinical decision support?自然语言处理能为临床决策支持做些什么?
J Biomed Inform. 2009 Oct;42(5):760-72. doi: 10.1016/j.jbi.2009.08.007. Epub 2009 Aug 13.
10
Recognizing obesity and comorbidities in sparse data.在稀疏数据中识别肥胖及合并症。
J Am Med Inform Assoc. 2009 Jul-Aug;16(4):561-70. doi: 10.1197/jamia.M3115. Epub 2009 Apr 23.

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验