• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

对澳大利亚全科医疗中用于患者病程记录的现有文本去识别工具的评估。

An evaluation of existing text de-identification tools for use with patient progress notes from Australian general practice.

作者信息

El-Hayek Carol, Barzegar Siamak, Faux Noel, Doyle Kim, Pillai Priyanka, Mutch Simon J, Vaisey Alaina, Ward Roger, Sanci Lena, Dunn Adam G, Hellard Margaret E, Hocking Jane S, Verspoor Karin, Boyle Douglas Ir

机构信息

Burnet Institute, Melbourne, Australia; Melbourne School of Population and Global Health, University of Melbourne, Australia; School of Public Health and Preventive Medicine, Monash University, Australia.

School of Computing and Information Systems, University of Melbourne, Australia.

出版信息

Int J Med Inform. 2023 May;173:105021. doi: 10.1016/j.ijmedinf.2023.105021. Epub 2023 Feb 11.

DOI:10.1016/j.ijmedinf.2023.105021
PMID:36870249
Abstract

INTRODUCTION

Digitized patient progress notes from general practice represent a significant resource for clinical and public health research but cannot feasibly and ethically be used for these purposes without automated de-identification. Internationally, several open-source natural language processing tools have been developed, however, given wide variations in clinical documentation practices, these cannot be utilized without appropriate review. We evaluated the performance of four de-identification tools and assessed their suitability for customization to Australian general practice progress notes.

METHODS

Four tools were selected: three rule-based (HMS Scrubber, MIT De-id, Philter) and one machine learning (MIST). 300 patient progress notes from three general practice clinics were manually annotated with personally identifying information. We conducted a pairwise comparison between the manual annotations and patient identifiers automatically detected by each tool, measuring recall (sensitivity), precision (positive predictive value), f1-score (harmonic mean of precision and recall), and f2-score (weighs recall 2x higher than precision). Error analysis was also conducted to better understand each tool's structure and performance.

RESULTS

Manual annotation detected 701 identifiers in seven categories. The rule-based tools detected identifiers in six categories and MIST in three. Philter achieved the highest aggregate recall (67%) and the highest recall for NAME (87%). HMS Scrubber achieved the highest recall for DATE (94%) and all tools performed poorly on LOCATION. MIST achieved the highest precision for NAME and DATE while also achieving similar recall to the rule-based tools for DATE and highest recall for LOCATION. Philter had the lowest aggregate precision (37%), however preliminary adjustments of its rules and dictionaries showed a substantial reduction in false positives.

CONCLUSION

Existing off-the-shelf solutions for automated de-identification of clinical text are not immediately suitable for our context without modification. Philter is the most promising candidate due to its high recall and flexibility however will require extensive revising of its pattern matching rules and dictionaries.

摘要

引言

来自全科医疗的数字化患者病程记录是临床和公共卫生研究的重要资源,但未经自动去识别处理,就无法在实际操作和伦理层面用于这些目的。在国际上,已经开发了几种开源自然语言处理工具,然而,鉴于临床文档记录做法差异很大,在未经适当审查的情况下无法使用这些工具。我们评估了四种去识别工具的性能,并评估了它们针对澳大利亚全科医疗病程记录进行定制的适用性。

方法

选择了四种工具:三种基于规则的工具(HMS Scrubber、麻省理工学院去识别工具、Philter)和一种机器学习工具(MIST)。从三个全科医疗诊所收集了300份患者病程记录,并手动标注了个人识别信息。我们对人工标注与每个工具自动检测到的患者标识符进行了两两比较,测量召回率(敏感性)、精确率(阳性预测值)、F1分数(精确率和召回率的调和平均值)和F2分数(召回率的权重是精确率的两倍)。还进行了错误分析,以更好地了解每个工具的结构和性能。

结果

人工标注在七个类别中检测到701个标识符。基于规则的工具在六个类别中检测到标识符,而MIST在三个类别中检测到标识符。Philter实现了最高的总体召回率(67%)和最高的姓名召回率(87%)。HMS Scrubber实现了最高的日期召回率(94%),所有工具在地点方面的表现都很差。MIST在姓名和日期方面实现了最高的精确率,同时在日期方面的召回率与基于规则的工具相似,在地点方面实现了最高的召回率。Philter的总体精确率最低(37%),但其规则和词典的初步调整显示误报大幅减少。

结论

现有的现成临床文本自动去识别解决方案未经修改不能立即适用于我们的情况。Philter因其高召回率和灵活性而成为最有前途的候选工具,但其模式匹配规则和词典需要大量修订。

相似文献

1
An evaluation of existing text de-identification tools for use with patient progress notes from Australian general practice.对澳大利亚全科医疗中用于患者病程记录的现有文本去识别工具的评估。
Int J Med Inform. 2023 May;173:105021. doi: 10.1016/j.ijmedinf.2023.105021. Epub 2023 Feb 11.
2
An Extensible Evaluation Framework Applied to Clinical Text Deidentification Natural Language Processing Tools: Multisystem and Multicorpus Study.应用于临床文本去标识化自然语言处理工具的可扩展评估框架:多系统和多语料库研究。
J Med Internet Res. 2024 May 28;26:e55676. doi: 10.2196/55676.
3
Automated de-identification of free-text medical records.自由文本医疗记录的自动去识别化
BMC Med Inform Decis Mak. 2008 Jul 24;8:32. doi: 10.1186/1472-6947-8-32.
4
A study of deep learning methods for de-identification of clinical notes in cross-institute settings.深度学习方法在跨机构环境下对临床记录进行去识别的研究。
BMC Med Inform Decis Mak. 2019 Dec 5;19(Suppl 5):232. doi: 10.1186/s12911-019-0935-4.
5
Improved de-identification of physician notes through integrative modeling of both public and private medical text.通过整合公有和私有医疗文本进行建模,提高医生笔记的去识别化程度。
BMC Med Inform Decis Mak. 2013 Oct 2;13:112. doi: 10.1186/1472-6947-13-112.
6
Optimizing annotation resources for natural language de-identification via a game theoretic framework.通过博弈论框架优化用于自然语言去识别的注释资源。
J Biomed Inform. 2016 Jun;61:97-109. doi: 10.1016/j.jbi.2016.03.019. Epub 2016 Mar 25.
7
Evaluating current automatic de-identification methods with Veteran's health administration clinical documents.评估退伍军人健康管理局临床文档中当前的自动去识别方法。
BMC Med Res Methodol. 2012 Jul 27;12:109. doi: 10.1186/1471-2288-12-109.
8
Automatic de-identification of French electronic health records: a cost-effective approach exploiting distant supervision and deep learning models.自动去除法国电子健康记录中的标识符:一种利用远程监督和深度学习模型的具有成本效益的方法。
BMC Med Inform Decis Mak. 2024 Feb 16;24(1):54. doi: 10.1186/s12911-024-02422-5.
9
Ensemble Approaches to Recognize Protected Health Information in Radiology Reports.基于集成方法的放射学报告中保护健康信息的识别。
J Digit Imaging. 2022 Dec;35(6):1694-1698. doi: 10.1007/s10278-022-00673-0. Epub 2022 Jun 17.
10
A De-identification method for bilingual clinical texts of various note types.一种针对各种笔记类型的双语临床文本的去识别方法。
J Korean Med Sci. 2015 Jan;30(1):7-15. doi: 10.3346/jkms.2015.30.1.7. Epub 2014 Dec 23.

引用本文的文献

1
DIRI: Adversarial Patient Reidentification with Large Language Models for Evaluating Clinical Text Anonymization.DIRI:使用大语言模型进行对抗性患者重新识别以评估临床文本匿名化
AMIA Jt Summits Transl Sci Proc. 2025 Jun 10;2025:355-364. eCollection 2025.