• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

评估临床记录中去识别化的难度和时间成本。

Assessing the difficulty and time cost of de-identification in clinical narratives.

作者信息

Dorr D A, Phillips W F, Phansalkar S, Sims S A, Hurdle J F

机构信息

Department of Medical Informatics and Clinical Epidemiology, Oregon Health and Science University, Portland, OR 97239, USA.

出版信息

Methods Inf Med. 2006;45(3):246-52.

PMID:16685332
Abstract

OBJECTIVE

To characterize the difficulty confronting investigators in removing protected health information (PHI) from cross-discipline, free-text clinical notes, an important challenge to clinical informatics research as recalibrated by the introduction of the US Health Insurance Portability and Accountability Act (HIPAA) and similar regulations.

METHODS

Randomized selection of clinical narratives from complete admissions written by diverse providers, reviewed using a two-tiered rater system and simple automated regular expression tools. For manual review, two independent reviewers used simple search and replace algorithms and visual scanning to find PHI as defined by HIPAA, followed by an independent second review to detect any missed PHI. Simple automated review was also performed for the "easy" PHI that are number- or date-based.

RESULTS

From 262 notes, 2074 PHI, or 7.9 +/- 6.1 per note, were found. The average recall (or sensitivity) was 95.9% while precision was 99.6% for single reviewers. Agreement between individual reviewers was strong (ICC = 0.99), although some asymmetry in errors was seen between reviewers (p = 0.001). The automated technique had better recall (98.5%) but worse precision (88.4%) for its subset of identifiers. Manually de-identifying a note took 87.3 +/- 61 seconds on average.

CONCLUSIONS

Manual de-identification of free-text notes is tedious and time-consuming, but even simple PHI is difficult to automatically identify with the exactitude required under HIPAA.

摘要

目的

描述研究人员在从跨学科的自由文本临床记录中移除受保护的健康信息(PHI)时所面临的困难,这是临床信息学研究面临的一项重大挑战,美国《健康保险流通与责任法案》(HIPAA)及类似法规的引入对其进行了重新调整。

方法

从不同医疗服务提供者撰写的完整住院记录中随机选取临床叙述,使用两级评分系统和简单的自动正则表达式工具进行审查。对于人工审查,两名独立审查员使用简单的搜索和替换算法以及视觉扫描来查找HIPAA定义的PHI,随后进行独立的二次审查以检测任何遗漏的PHI。对于基于数字或日期的“简单”PHI也进行了简单的自动审查。

结果

在262份记录中,共发现2074条PHI,每份记录平均有7.9±6.1条。单个审查员的平均召回率(或敏感度)为95.9%,而精确率为99.6%。个体审查员之间的一致性很强(组内相关系数=0.99),尽管审查员之间在错误方面存在一些不对称性(p=0.001)。自动技术对于其标识符子集的召回率更高(98.5%),但精确率更低(88.4%)。手动对一份记录进行去识别平均需要87.3±61秒。

结论

对自由文本记录进行手动去识别既繁琐又耗时,但即使是简单的PHI也难以按照HIPAA要求的精确程度自动识别。

相似文献

1
Assessing the difficulty and time cost of de-identification in clinical narratives.评估临床记录中去识别化的难度和时间成本。
Methods Inf Med. 2006;45(3):246-52.
2
Automated de-identification of free-text medical records.自由文本医疗记录的自动去识别化
BMC Med Inform Decis Mak. 2008 Jul 24;8:32. doi: 10.1186/1472-6947-8-32.
3
Location bias of identifiers in clinical narratives.临床叙述中标识符的位置偏差。
AMIA Annu Symp Proc. 2013 Nov 16;2013:560-9. eCollection 2013.
4
Automatic de-identification of textual documents in the electronic health record: a review of recent research.电子健康记录中文本文件的自动去识别:近期研究综述。
BMC Med Res Methodol. 2010 Aug 2;10:70. doi: 10.1186/1471-2288-10-70.
5
De-Identification of Medical Narrative Data.医学叙事数据的去识别化
Stud Health Technol Inform. 2017;244:23-27.
6
Inductive creation of an annotation schema and a reference standard for de-identification of VA electronic clinical notes.归纳创建用于对退伍军人事务部电子临床记录进行去识别处理的注释模式和参考标准。
AMIA Annu Symp Proc. 2009 Nov 14;2009:416-20.
7
Evaluating current automatic de-identification methods with Veteran's health administration clinical documents.评估退伍军人健康管理局临床文档中当前的自动去识别方法。
BMC Med Res Methodol. 2012 Jul 27;12:109. doi: 10.1186/1471-2288-12-109.
8
Preparing an annotated gold standard corpus to share with extramural investigators for de-identification research.准备一个带注释的金标准语料库,以便与校外研究人员共享用于去识别化研究。
J Biomed Inform. 2014 Aug;50:173-183. doi: 10.1016/j.jbi.2014.01.014. Epub 2014 Feb 17.
9
Evaluating the effects of machine pre-annotation and an interactive annotation interface on manual de-identification of clinical text.评估机器预标注和交互式标注界面在临床文本人工去识别化方面的效果。
J Biomed Inform. 2014 Aug;50:162-72. doi: 10.1016/j.jbi.2014.05.002. Epub 2014 May 20.
10
Large-scale evaluation of automated clinical note de-identification and its impact on information extraction.自动化临床记录去识别的大规模评估及其对信息提取的影响。
J Am Med Inform Assoc. 2013 Jan 1;20(1):84-94. doi: 10.1136/amiajnl-2012-001012. Epub 2012 Aug 2.

引用本文的文献

1
pyDeid: an improved, fast, flexible, and generalizable rule-based approach for deidentification of free-text medical records.pyDeid:一种用于对自由文本医疗记录进行去识别处理的经过改进的、快速、灵活且可推广的基于规则的方法。
JAMIA Open. 2025 Jan 22;8(1):ooae152. doi: 10.1093/jamiaopen/ooae152. eCollection 2025 Feb.
2
Report of the Medical Image De-Identification (MIDI) Task Group -- Best Practices and Recommendations.医学图像去识别化(MIDI)任务组报告——最佳实践与建议
ArXiv. 2025 Mar 16:arXiv:2303.10473v3.
3
The OpenDeID corpus for patient de-identification.
OpenDeID 患者去识别语料库。
Sci Rep. 2021 Oct 7;11(1):19973. doi: 10.1038/s41598-021-99554-9.
4
Transferability of neural network clinical deidentification systems.神经网络临床去识别系统的可转移性。
J Am Med Inform Assoc. 2021 Nov 25;28(12):2661-2669. doi: 10.1093/jamia/ocab207.
5
Benchmarking Modern Named Entity Recognition Techniques for Free-text Health Record Deidentification.针对自由文本健康记录去识别的现代命名实体识别技术的基准测试。
AMIA Jt Summits Transl Sci Proc. 2021 May 17;2021:102-111. eCollection 2021.
6
Resilience of clinical text de-identified with "hiding in plain sight" to hostile reidentification attacks by human readers.临床去标识文本的“以明掩暗”抵御人类读者敌对重新识别攻击的弹性。
J Am Med Inform Assoc. 2020 Jul 1;27(9):1374-1382. doi: 10.1093/jamia/ocaa095.
7
A study of deep learning methods for de-identification of clinical notes in cross-institute settings.深度学习方法在跨机构环境下对临床记录进行去识别的研究。
BMC Med Inform Decis Mak. 2019 Dec 5;19(Suppl 5):232. doi: 10.1186/s12911-019-0935-4.
8
The machine giveth and the machine taketh away: a parrot attack on clinical text deidentified with hiding in plain sight.机器给予,机器又夺走:隐藏在明处的鹦鹉攻击对临床文本去识别。
J Am Med Inform Assoc. 2019 Dec 1;26(12):1536-1544. doi: 10.1093/jamia/ocz114.
9
Efficient Active Learning for Electronic Medical Record De-identification.用于电子病历去识别化的高效主动学习
AMIA Jt Summits Transl Sci Proc. 2019 May 6;2019:462-471. eCollection 2019.
10
A two-site survey of medical center personnel's willingness to share clinical data for research: implications for reproducible health NLP research.医学中心人员愿意为研究共享临床数据的双站点调查:对可重复的健康自然语言处理研究的启示。
BMC Med Inform Decis Mak. 2019 Apr 4;19(Suppl 3):70. doi: 10.1186/s12911-019-0778-z.