Suppr超能文献

三种现成去识别工具的速度与准确性比较分析

A Comparative Analysis of Speed and Accuracy for Three Off-the-Shelf De-Identification Tools.

作者信息

Heider Paul M, Obeid Jihad S, Meystre Stéphane M

机构信息

Biomedical Informatics Center, Medical University of South Carolina, Charleston, SC.

出版信息

AMIA Jt Summits Transl Sci Proc. 2020 May 30;2020:241-250. eCollection 2020.

Abstract

A growing quantity of health data is being stored in Electronic Health Records (EHR). The free-text section of these clinical notes contains important patient and treatment information for research but also contains Personally Identifiable Information (PII), which cannot be freely shared within the research community without compromising patient confidentiality and privacy rights. Significant work has been invested in investigating automated approaches to text de-identification, the process of removing or redacting PII. Few studies have examined the performance of existing de-identification pipelines in a controlled comparative analysis. In this study, we use publicly available corpora to analyze speed and accuracy differences between three de-identification systems that can be run off-the-shelf: Amazon Comprehend Medical PHId, Clinacuity's CliniDeID, and the National Library of Medicine's Scrubber. No single system dominated all the compared metrics. NLM Scrubber was the fastest while CliniDeID generally had the highest accuracy.

摘要

越来越多的健康数据被存储在电子健康记录(EHR)中。这些临床记录的自由文本部分包含了用于研究的重要患者和治疗信息,但也包含个人身份信息(PII),在不损害患者保密性和隐私权的情况下,这些信息不能在研究社区内自由共享。人们已经投入了大量工作来研究文本去识别化的自动化方法,即去除或编辑PII的过程。很少有研究在受控的比较分析中检验现有去识别化流程的性能。在本研究中,我们使用公开可用的语料库来分析三种现成的去识别化系统之间的速度和准确性差异:亚马逊理解医疗PHId、Clinacuity的CliniDeID以及美国国立医学图书馆的Scrubber。没有一个系统在所有比较指标上都占主导地位。NLM Scrubber速度最快,而CliniDeID通常准确性最高。

相似文献

1
A Comparative Analysis of Speed and Accuracy for Three Off-the-Shelf De-Identification Tools.
AMIA Jt Summits Transl Sci Proc. 2020 May 30;2020:241-250. eCollection 2020.
3
An evaluation of existing text de-identification tools for use with patient progress notes from Australian general practice.
Int J Med Inform. 2023 May;173:105021. doi: 10.1016/j.ijmedinf.2023.105021. Epub 2023 Feb 11.
4
A study of deep learning methods for de-identification of clinical notes in cross-institute settings.
BMC Med Inform Decis Mak. 2019 Dec 5;19(Suppl 5):232. doi: 10.1186/s12911-019-0935-4.
5
Text de-identification for privacy protection: a study of its impact on clinical text information content.
J Biomed Inform. 2014 Aug;50:142-50. doi: 10.1016/j.jbi.2014.01.011. Epub 2014 Feb 3.
6
Patient Privacy in the Era of Big Data.
Balkan Med J. 2018 Jan 20;35(1):8-17. doi: 10.4274/balkanmedj.2017.0966. Epub 2017 Sep 13.
7
Customization scenarios for de-identification of clinical notes.
BMC Med Inform Decis Mak. 2020 Jan 30;20(1):14. doi: 10.1186/s12911-020-1026-2.
9
Challenges and Insights in Using HIPAA Privacy Rule for Clinical Text Annotation.
AMIA Annu Symp Proc. 2015 Nov 5;2015:707-16. eCollection 2015.
10

引用本文的文献

1
Exploring Freely Available Data Tools to Support Open Data and Open Science.
J Hosp Librariansh. 2024;24(2):104-111. doi: 10.1080/15323269.2024.2326787. Epub 2024 Apr 9.
4
Topology and redescriptions detect multiple alternative biological pathways from clinical phenotypes.
Exp Biol Med (Maywood). 2022 Nov;247(22):2015-2024. doi: 10.1177/15353702221126671. Epub 2022 Nov 18.
6
Ensuring a safe(r) harbor: Excising personally identifiable information from structured electronic health record data.
J Clin Transl Sci. 2021 Dec 9;6(1):e10. doi: 10.1017/cts.2021.880. eCollection 2022.
7
Building a best-in-class automated de-identification tool for electronic health records through ensemble learning.
Patterns (N Y). 2021 May 12;2(6):100255. doi: 10.1016/j.patter.2021.100255. eCollection 2021 Jun 11.
8
An evaluation of two commercial deep learning-based information retrieval systems for COVID-19 literature.
J Am Med Inform Assoc. 2021 Jan 15;28(1):132-137. doi: 10.1093/jamia/ocaa271.

本文引用的文献

1
A survey of practices for the use of electronic health records to support research recruitment.
J Clin Transl Sci. 2017 Aug;1(4):246-252. doi: 10.1017/cts.2017.301.
2
De-identification of psychiatric intake records: Overview of 2016 CEGS N-GRID shared tasks Track 1.
J Biomed Inform. 2017 Nov;75S:S4-S18. doi: 10.1016/j.jbi.2017.06.011. Epub 2017 Jun 11.
3
Automated systems for the de-identification of longitudinal clinical narratives: Overview of 2014 i2b2/UTHealth shared task Track 1.
J Biomed Inform. 2015 Dec;58 Suppl(Suppl):S11-S19. doi: 10.1016/j.jbi.2015.06.007. Epub 2015 Jul 28.
4
A review of approaches to identifying patient phenotype cohorts using electronic health records.
J Am Med Inform Assoc. 2014 Mar-Apr;21(2):221-30. doi: 10.1136/amiajnl-2013-001935. Epub 2013 Nov 7.
8
Rapidly retargetable approaches to de-identification in medical records.
J Am Med Inform Assoc. 2007 Sep-Oct;14(5):564-73. doi: 10.1197/jamia.M2435. Epub 2007 Jun 28.

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验