为叙事性患者记录构建去识别系统：成本效益权衡。

Bootstrapping a de-identification system for narrative patient records: cost-performance tradeoffs.

机构信息

Department of Pediatrics, University of Michigan, Ann Arbor, MI, USA.

出版信息

Int J Med Inform. 2013 Sep;82(9):821-31. doi: 10.1016/j.ijmedinf.2013.03.005. Epub 2013 Apr 30.

DOI:10.1016/j.ijmedinf.2013.03.005

PMID:23643147

Abstract

PURPOSE

We describe an experiment to build a de-identification system for clinical records using the open source MITRE Identification Scrubber Toolkit (MIST). We quantify the human annotation effort needed to produce a system that de-identifies at high accuracy.

METHODS

Using two types of clinical records (history and physical notes, and social work notes), we iteratively built statistical de-identification models by annotating 10 notes, training a model, applying the model to another 10 notes, correcting the model's output, and training from the resulting larger set of annotated notes. This was repeated for 20 rounds of 10 notes each, and then an additional 6 rounds of 20 notes each, and a final round of 40 notes. At each stage, we measured precision, recall, and F-score, and compared these to the amount of annotation time needed to complete the round.

RESULTS

After the initial 10-note round (33min of annotation time) we achieved an F-score of 0.89. After just over 8h of annotation time (round 21) we achieved an F-score of 0.95. Number of annotation actions needed, as well as time needed, decreased in later rounds as model performance improved. Accuracy on history and physical notes exceeded that of social work notes, suggesting that the wider variety and contexts for protected health information (PHI) in social work notes is more difficult to model.

CONCLUSIONS

It is possible, with modest effort, to build a functioning de-identification system de novo using the MIST framework. The resulting system achieved performance comparable to other high-performing de-identification systems.

摘要

目的

我们描述了一个使用开源的麻省理工学院识别清洗工具包（MIST）构建临床记录去识别系统的实验。我们量化了产生高精度去识别系统所需的人工注释工作。

方法

使用两种类型的临床记录（病史和体检记录，以及社会工作记录），我们通过注释 10 个记录、训练一个模型、将模型应用于另外 10 个记录、纠正模型的输出以及从更大的注释记录集中训练来迭代地构建统计去识别模型。这一过程重复了 20 轮，每轮 10 个记录，然后又进行了 6 轮，每轮 20 个记录，最后一轮是 40 个记录。在每个阶段，我们测量了精度、召回率和 F 分数，并将这些与完成轮次所需的注释时间进行了比较。

结果

在初始的 10 个记录轮（33 分钟的注释时间）之后，我们实现了 0.89 的 F 分数。在 8 个多小时的注释时间（第 21 轮）之后，我们实现了 0.95 的 F 分数。随着模型性能的提高，注释操作的数量和所需的时间都在后期轮次中减少。病史和体检记录的准确性高于社会工作记录，这表明社会工作记录中保护健康信息（PHI）的种类更多，上下文更复杂，更难建模。

结论

使用 MIST 框架，适度努力就有可能从头开始构建功能齐全的去识别系统。所得到的系统的性能可与其他高性能去识别系统相媲美。

相似文献

Bootstrapping a de-identification system for narrative patient records: cost-performance tradeoffs.为叙事性患者记录构建去识别系统：成本效益权衡。

Int J Med Inform. 2013 Sep;82(9):821-31. doi: 10.1016/j.ijmedinf.2013.03.005. Epub 2013 Apr 30.

De-identification of clinical notes in French: towards a protocol for reference corpus development.法语临床记录的去识别化：迈向参考语料库开发协议

J Biomed Inform. 2014 Aug;50:151-61. doi: 10.1016/j.jbi.2013.12.014. Epub 2013 Dec 29.

Developing a standard for de-identifying electronic patient records written in Swedish: precision, recall and F-measure in a manual and computerized annotation trial.开发一种用于去除瑞典语电子病历中标识符的标准：手动和计算机化注释试验中的精度、召回率和 F 度量。

Int J Med Inform. 2009 Dec;78(12):e19-26. doi: 10.1016/j.ijmedinf.2009.04.005. Epub 2009 May 23.

Text de-identification for privacy protection: a study of its impact on clinical text information content.用于隐私保护的文本去识别化：对其对临床文本信息内容影响的一项研究

J Biomed Inform. 2014 Aug;50:142-50. doi: 10.1016/j.jbi.2014.01.011. Epub 2014 Feb 3.

The MITRE Identification Scrubber Toolkit: design, training, and assessment.MITRE 识别清理工具包：设计、培训和评估。

Int J Med Inform. 2010 Dec;79(12):849-59. doi: 10.1016/j.ijmedinf.2010.09.007. Epub 2010 Oct 14.

Proposal and evaluation of FASDIM, a Fast And Simple De-Identification Method for unstructured free-text clinical records.提出并评估了 FASDIM，一种用于非结构化自由文本临床记录的快速简便去识别方法。

Int J Med Inform. 2014 Apr;83(4):303-12. doi: 10.1016/j.ijmedinf.2013.11.005. Epub 2013 Dec 7.

Effects of personal identifier resynthesis on clinical text de-identification.个人标识符再合成对临床文本去识别的影响。

J Am Med Inform Assoc. 2010 Mar-Apr;17(2):159-68. doi: 10.1136/jamia.2009.002212.

Development and evaluation of a de-identification procedure for a case register sourced from mental health electronic records.开发和评估一种从精神健康电子记录来源的病例登记中去除识别信息的程序。

BMC Med Inform Decis Mak. 2013 Jul 11;13:71. doi: 10.1186/1472-6947-13-71.

An evaluation of existing text de-identification tools for use with patient progress notes from Australian general practice.对澳大利亚全科医疗中用于患者病程记录的现有文本去识别工具的评估。

Int J Med Inform. 2023 May;173:105021. doi: 10.1016/j.ijmedinf.2023.105021. Epub 2023 Feb 11.

Automated de-identification of free-text medical records.自由文本医疗记录的自动去识别化

BMC Med Inform Decis Mak. 2008 Jul 24;8:32. doi: 10.1186/1472-6947-8-32.

引用本文的文献

Web-Based Application Based on Human-in-the-Loop Deep Learning for Deidentifying Free-Text Data in Electronic Medical Records: Development and Usability Study.基于人在回路深度学习的电子病历自由文本数据去识别化的网络应用程序：开发与可用性研究

Interact J Med Res. 2023 Aug 25;12:e46322. doi: 10.2196/46322.

Classifying Cyber-Risky Clinical Notes by Employing Natural Language Processing.利用自然语言处理对具有网络风险的临床记录进行分类

Proc Annu Hawaii Int Conf Syst Sci. 2022;2022:4140-4146. doi: 10.24251/hicss.2022.505. Epub 2022 Jan 4.

Disambiguating Clinical Abbreviations Using a One-Fits-All Classifier Based on Deep Learning Techniques.基于深度学习技术的一刀切分类器在临床缩写中的应用。

Methods Inf Med. 2022 Jun;61(S 01):e28-e34. doi: 10.1055/s-0042-1742388. Epub 2022 Feb 1.

Efficient Active Learning for Electronic Medical Record De-identification.用于电子病历去识别化的高效主动学习

AMIA Jt Summits Transl Sci Proc. 2019 May 6;2019:462-471. eCollection 2019.

An annotation and modeling schema for prescription regimens.处方用药方案的注释与建模架构

J Biomed Semantics. 2019 May 31;10(1):10. doi: 10.1186/s13326-019-0201-9.

Annu Rev Biomed Data Sci. 2018 Jul;1:115-129. doi: 10.1146/annurev-biodatasci-080917-013416.

De-identification of patient notes with recurrent neural networks.使用递归神经网络对患者记录进行去识别化处理。

J Am Med Inform Assoc. 2017 May 1;24(3):596-606. doi: 10.1093/jamia/ocw156.

Is the Juice Worth the Squeeze? Costs and Benefits of Multiple Human Annotators for Clinical Text De-identification.榨取成果是否值得？多名人工标注者在临床文本去识别化中的成本与收益

Methods Inf Med. 2016 Aug 5;55(4):356-64. doi: 10.3414/ME15-01-0122. Epub 2016 Jul 13.

Using Nonexperts for Annotating Pharmacokinetic Drug-Drug Interaction Mentions in Product Labeling: A Feasibility Study.利用非专业人员标注药品说明书中药物代谢动力学药物相互作用提及内容：一项可行性研究。

JMIR Res Protoc. 2016 Apr 11;5(2):e40. doi: 10.2196/resprot.5028.

NOBLE - Flexible concept recognition for large-scale biomedical natural language processing.NOBLE——用于大规模生物医学自然语言处理的灵活概念识别

BMC Bioinformatics. 2016 Jan 14;17:32. doi: 10.1186/s12859-015-0871-y.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

为叙事性患者记录构建去识别系统：成本效益权衡。

Bootstrapping a de-identification system for narrative patient records: cost-performance tradeoffs.

机构信息

出版信息

PURPOSE

METHODS

RESULTS

CONCLUSIONS

目的

方法

结果

结论

相似文献

引用本文的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

相似文献

引用本文的文献