• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

临床记录去识别的定制化场景。

Customization scenarios for de-identification of clinical notes.

机构信息

Google Research, Google LLC, 1600 Amphitheatre Parkway, Mountain View, CA, USA.

, Palo Alto, CA, USA.

出版信息

BMC Med Inform Decis Mak. 2020 Jan 30;20(1):14. doi: 10.1186/s12911-020-1026-2.

DOI:10.1186/s12911-020-1026-2
PMID:32000770
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC6993314/
Abstract

BACKGROUND

Automated machine-learning systems are able to de-identify electronic medical records, including free-text clinical notes. Use of such systems would greatly boost the amount of data available to researchers, yet their deployment has been limited due to uncertainty about their performance when applied to new datasets.

OBJECTIVE

We present practical options for clinical note de-identification, assessing performance of machine learning systems ranging from off-the-shelf to fully customized.

METHODS

We implement a state-of-the-art machine learning de-identification system, training and testing on pairs of datasets that match the deployment scenarios. We use clinical notes from two i2b2 competition corpora, the Physionet Gold Standard corpus, and parts of the MIMIC-III dataset.

RESULTS

Fully customized systems remove 97-99% of personally identifying information. Performance of off-the-shelf systems varies by dataset, with performance mostly above 90%. Providing a small labeled dataset or large unlabeled dataset allows for fine-tuning that improves performance over off-the-shelf systems.

CONCLUSION

Health organizations should be aware of the levels of customization available when selecting a de-identification deployment solution, in order to choose the one that best matches their resources and target performance level.

摘要

背景

自动化机器学习系统能够对电子病历(包括自由文本临床记录)进行去识别化。此类系统的使用将极大地增加研究人员可获取的数据量,但由于其在应用于新数据集时的性能存在不确定性,因此其部署受到了限制。

目的

我们提出了临床记录去识别化的实用选项,评估了从现成的到完全定制的机器学习系统的性能。

方法

我们实现了最先进的机器学习去识别化系统,在匹配部署场景的数据集对上进行训练和测试。我们使用了来自两个 i2b2 竞赛语料库、Physionet 金标准语料库以及 MIMIC-III 数据集的部分内容的临床记录。

结果

完全定制的系统可以去除 97%-99%的个人识别信息。现成系统的性能因数据集而异,大多数性能超过 90%。提供一个小的有标签数据集或大的无标签数据集可以进行微调,从而提高现成系统的性能。

结论

医疗组织在选择去识别化部署解决方案时应了解可用的定制化程度,以便选择最符合其资源和目标性能水平的解决方案。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2cbf/6993314/1f4bdb209b28/12911_2020_1026_Fig2_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2cbf/6993314/778ce6a1dc34/12911_2020_1026_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2cbf/6993314/1f4bdb209b28/12911_2020_1026_Fig2_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2cbf/6993314/778ce6a1dc34/12911_2020_1026_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2cbf/6993314/1f4bdb209b28/12911_2020_1026_Fig2_HTML.jpg

相似文献

1
Customization scenarios for de-identification of clinical notes.临床记录去识别的定制化场景。
BMC Med Inform Decis Mak. 2020 Jan 30;20(1):14. doi: 10.1186/s12911-020-1026-2.
2
De-identification of patient notes with recurrent neural networks.使用递归神经网络对患者记录进行去识别化处理。
J Am Med Inform Assoc. 2017 May 1;24(3):596-606. doi: 10.1093/jamia/ocw156.
3
A study of deep learning methods for de-identification of clinical notes in cross-institute settings.深度学习方法在跨机构环境下对临床记录进行去识别的研究。
BMC Med Inform Decis Mak. 2019 Dec 5;19(Suppl 5):232. doi: 10.1186/s12911-019-0935-4.
4
Preparing an annotated gold standard corpus to share with extramural investigators for de-identification research.准备一个带注释的金标准语料库,以便与校外研究人员共享用于去识别化研究。
J Biomed Inform. 2014 Aug;50:173-183. doi: 10.1016/j.jbi.2014.01.014. Epub 2014 Feb 17.
5
Leveraging existing corpora for de-identification of psychiatric notes using domain adaptation.利用现有语料库,通过领域自适应对精神科病历进行去识别化处理。
AMIA Annu Symp Proc. 2018 Apr 16;2017:1070-1079. eCollection 2017.
6
Improving domain adaptation in de-identification of electronic health records through self-training.通过自训练提高电子健康记录去识别中的领域自适应。
J Am Med Inform Assoc. 2021 Sep 18;28(10):2093-2100. doi: 10.1093/jamia/ocab128.
7
Impact of De-Identification on Clinical Text Classification Using Traditional and Deep Learning Classifiers.去识别化对使用传统和深度学习分类器的临床文本分类的影响。
Stud Health Technol Inform. 2019 Aug 21;264:283-287. doi: 10.3233/SHTI190228.
8
Identifying and characterizing highly similar notes in big clinical note datasets.在大型临床笔记数据集 中识别和描述高度相似的笔记。
J Biomed Inform. 2018 Jun;82:63-69. doi: 10.1016/j.jbi.2018.04.009. Epub 2018 Apr 19.
9
De-identifying free text of Japanese electronic health records.去标识化日本电子健康记录的自由文本。
J Biomed Semantics. 2020 Sep 21;11(1):11. doi: 10.1186/s13326-020-00227-9.
10
De-identification of clinical free text using natural language processing: A systematic review of current approaches.使用自然语言处理对临床自由文本进行去识别化:当前方法的系统评价。
Artif Intell Med. 2024 May;151:102845. doi: 10.1016/j.artmed.2024.102845. Epub 2024 Mar 20.

引用本文的文献

1
Toward Identifying New Risk Aversions and Subsequent Limitations and Biases When Making De-identified Structured Data Sets Openly Available in a Post-LLM world.在大语言模型时代,当使去识别化的结构化数据集公开可用时,致力于识别新的风险规避以及随之而来的限制和偏差。
AMIA Annu Symp Proc. 2025 May 22;2024:262-270. eCollection 2024.
2
De-identification of clinical notes with pseudo-labeling using regular expression rules and pre-trained BERT.使用正则表达式规则和预训练的BERT进行伪标签标注以实现临床记录的去识别化。
BMC Med Inform Decis Mak. 2025 Feb 17;25(1):82. doi: 10.1186/s12911-025-02913-z.
3
Lightweight transformers for clinical natural language processing.

本文引用的文献

1
A comparison of word embeddings for the biomedical natural language processing.生物医学自然语言处理中词嵌入的比较。
J Biomed Inform. 2018 Nov;87:12-20. doi: 10.1016/j.jbi.2018.09.008. Epub 2018 Sep 12.
2
Leveraging existing corpora for de-identification of psychiatric notes using domain adaptation.利用现有语料库,通过领域自适应对精神科病历进行去识别化处理。
AMIA Annu Symp Proc. 2018 Apr 16;2017:1070-1079. eCollection 2017.
3
Modes of De-identification.去识别化模式。
用于临床自然语言处理的轻量级变压器
Nat Lang Eng. 2024 Sep;30(5):887-914. doi: 10.1017/S1351324923000542. Epub 2024 Jan 12.
4
Examining the Generalizability of Pretrained De-identification Transformer Models on Narrative Nursing Notes.考察预训练去识别变换模型在叙事护理记录上的泛化能力。
Appl Clin Inform. 2024 Mar;15(2):357-367. doi: 10.1055/a-2282-4340. Epub 2024 Mar 6.
5
Development and Validation of a Natural Language Processing Algorithm to Pseudonymize Documents in the Context of a Clinical Data Warehouse.开发和验证一种自然语言处理算法,以在临床数据仓库环境中对文档进行化名处理。
Methods Inf Med. 2024 May;63(1-02):21-34. doi: 10.1055/s-0044-1778693. Epub 2024 Mar 5.
6
Commercializing Personal Health Information: A Critical Qualitative Content Analysis of Documents Describing Proprietary Primary Care Databases in Canada.将个人健康信息商业化:对加拿大描述专有的初级保健数据库的文件进行批判性定性内容分析。
Int J Health Policy Manag. 2023;12:6938. doi: 10.34172/ijhpm.2023.6938. Epub 2023 May 2.
7
Selecting Privacy-Enhancing Technologies for Managing Health Data Use.选择隐私增强技术来管理健康数据的使用。
Front Public Health. 2022 Mar 16;10:814163. doi: 10.3389/fpubh.2022.814163. eCollection 2022.
8
Data Pseudonymization in a Range That Does Not Affect Data Quality: Correlation with the Degree of Participation of Clinicians.数据在不影响数据质量的范围内进行伪匿名化:与临床医生参与程度的相关性。
J Korean Med Sci. 2021 Nov 15;36(44):e299. doi: 10.3346/jkms.2021.36.e299.
9
The OpenDeID corpus for patient de-identification.OpenDeID 患者去识别语料库。
Sci Rep. 2021 Oct 7;11(1):19973. doi: 10.1038/s41598-021-99554-9.
10
Improving domain adaptation in de-identification of electronic health records through self-training.通过自训练提高电子健康记录去识别中的领域自适应。
J Am Med Inform Assoc. 2021 Sep 18;28(10):2093-2100. doi: 10.1093/jamia/ocab128.
AMIA Annu Symp Proc. 2018 Apr 16;2017:1044-1050. eCollection 2017.
4
A bibliometric analysis of natural language processing in medical research.自然语言处理在医学研究中的文献计量分析。
BMC Med Inform Decis Mak. 2018 Mar 22;18(Suppl 1):14. doi: 10.1186/s12911-018-0594-x.
5
A hybrid approach to automatic de-identification of psychiatric notes.一种混合方法,用于自动识别精神科病历中的身份信息。
J Biomed Inform. 2017 Nov;75S:S19-S27. doi: 10.1016/j.jbi.2017.06.006. Epub 2017 Jun 7.
6
De-identification of clinical notes via recurrent neural network and conditional random field.通过递归神经网络和条件随机场对临床记录进行去识别。
J Biomed Inform. 2017 Nov;75S:S34-S42. doi: 10.1016/j.jbi.2017.05.023. Epub 2017 Jun 1.
7
De-identification of patient notes with recurrent neural networks.使用递归神经网络对患者记录进行去识别化处理。
J Am Med Inform Assoc. 2017 May 1;24(3):596-606. doi: 10.1093/jamia/ocw156.
8
MIMIC-III, a freely accessible critical care database.MIMIC-III,一个免费获取的重症监护数据库。
Sci Data. 2016 May 24;3:160035. doi: 10.1038/sdata.2016.35.
9
A Study of Concept Extraction Across Different Types of Clinical Notes.不同类型临床记录中的概念提取研究。
AMIA Annu Symp Proc. 2015 Nov 5;2015:737-46. eCollection 2015.
10
Annotating longitudinal clinical narratives for de-identification: The 2014 i2b2/UTHealth corpus.用于去识别化的纵向临床记录标注:2014年i2b2/德克萨斯大学健康科学中心语料库
J Biomed Inform. 2015 Dec;58 Suppl(Suppl):S20-S29. doi: 10.1016/j.jbi.2015.07.020. Epub 2015 Aug 28.