• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

自动化临床记录去识别的大规模评估及其对信息提取的影响。

Large-scale evaluation of automated clinical note de-identification and its impact on information extraction.

机构信息

Division of Biomedical Informatics, Cincinnati Children's Hospital Medical Center, Cincinnati, OH 45229-3039, USA.

出版信息

J Am Med Inform Assoc. 2013 Jan 1;20(1):84-94. doi: 10.1136/amiajnl-2012-001012. Epub 2012 Aug 2.

DOI:10.1136/amiajnl-2012-001012
PMID:22859645
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC3555323/
Abstract

OBJECTIVE

(1) To evaluate a state-of-the-art natural language processing (NLP)-based approach to automatically de-identify a large set of diverse clinical notes. (2) To measure the impact of de-identification on the performance of information extraction algorithms on the de-identified documents.

MATERIAL AND METHODS

A cross-sectional study that included 3503 stratified, randomly selected clinical notes (over 22 note types) from five million documents produced at one of the largest US pediatric hospitals. Sensitivity, precision, F value of two automated de-identification systems for removing all 18 HIPAA-defined protected health information elements were computed. Performance was assessed against a manually generated 'gold standard'. Statistical significance was tested. The automated de-identification performance was also compared with that of two humans on a 10% subsample of the gold standard. The effect of de-identification on the performance of subsequent medication extraction was measured.

RESULTS

The gold standard included 30 815 protected health information elements and more than one million tokens. The most accurate NLP method had 91.92% sensitivity (R) and 95.08% precision (P) overall. The performance of the system was indistinguishable from that of human annotators (annotators' performance was 92.15%(R)/93.95%(P) and 94.55%(R)/88.45%(P) overall while the best system obtained 92.91%(R)/95.73%(P) on same text). The impact of automated de-identification was minimal on the utility of the narrative notes for subsequent information extraction as measured by the sensitivity and precision of medication name extraction.

DISCUSSION AND CONCLUSION

NLP-based de-identification shows excellent performance that rivals the performance of human annotators. Furthermore, unlike manual de-identification, the automated approach scales up to millions of documents quickly and inexpensively.

摘要

目的

(1)评估一种最先进的自然语言处理(NLP)方法,以自动识别大量多样化的临床记录。(2)衡量去识别对信息提取算法在去识别文档上的性能的影响。

材料与方法

这是一项横断面研究,纳入了来自美国最大的儿科医院之一的五百万份文件中随机选择的 3503 份分层临床记录(超过 22 种记录类型)。计算了两种自动去识别系统去除所有 18 个 HIPAA 定义的受保护健康信息元素的所有敏感性、精度和 F 值。使用手动生成的“黄金标准”进行评估。测试了统计学意义。还将自动化去识别性能与黄金标准的 10%子样本上的两名人类进行了比较。测量了去识别对随后药物提取性能的影响。

结果

黄金标准包括 30815 个受保护健康信息元素和超过一百万个标记。最准确的 NLP 方法的整体敏感性(R)为 91.92%,精度(P)为 95.08%。系统的性能与人工注释器的性能无法区分(注释器的性能总体上为 92.15%(R)/93.95%(P)和 94.55%(R)/88.45%(P),而最佳系统在相同文本上获得 92.91%(R)/95.73%(P))。自动去识别对叙事记录后续信息提取的实用性的影响很小,这可以通过药物名称提取的敏感性和精度来衡量。

讨论与结论

基于 NLP 的去识别表现出卓越的性能,可与人工注释器的性能相媲美。此外,与手动去识别不同,自动化方法可以快速、经济地扩展到数百万份文件。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1a6d/3555323/9ddec008c115/amiajnl-2012-001012fig6.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1a6d/3555323/a6033bc33224/amiajnl-2012-001012fig1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1a6d/3555323/1ab0607f5873/amiajnl-2012-001012fig2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1a6d/3555323/944f42697fb1/amiajnl-2012-001012fig3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1a6d/3555323/ea551cefde89/amiajnl-2012-001012fig4.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1a6d/3555323/5afa0c7b7fdf/amiajnl-2012-001012fig5.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1a6d/3555323/9ddec008c115/amiajnl-2012-001012fig6.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1a6d/3555323/a6033bc33224/amiajnl-2012-001012fig1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1a6d/3555323/1ab0607f5873/amiajnl-2012-001012fig2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1a6d/3555323/944f42697fb1/amiajnl-2012-001012fig3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1a6d/3555323/ea551cefde89/amiajnl-2012-001012fig4.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1a6d/3555323/5afa0c7b7fdf/amiajnl-2012-001012fig5.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1a6d/3555323/9ddec008c115/amiajnl-2012-001012fig6.jpg

相似文献

1
Large-scale evaluation of automated clinical note de-identification and its impact on information extraction.自动化临床记录去识别的大规模评估及其对信息提取的影响。
J Am Med Inform Assoc. 2013 Jan 1;20(1):84-94. doi: 10.1136/amiajnl-2012-001012. Epub 2012 Aug 2.
2
BoB, a best-of-breed automated text de-identification system for VHA clinical documents.BoB,一种针对 VHA 临床文档的最佳自动文本去识别系统。
J Am Med Inform Assoc. 2013 Jan 1;20(1):77-83. doi: 10.1136/amiajnl-2012-001020. Epub 2012 Sep 4.
3
Automated de-identification of free-text medical records.自由文本医疗记录的自动去识别化
BMC Med Inform Decis Mak. 2008 Jul 24;8:32. doi: 10.1186/1472-6947-8-32.
4
Text de-identification for privacy protection: a study of its impact on clinical text information content.用于隐私保护的文本去识别化:对其对临床文本信息内容影响的一项研究
J Biomed Inform. 2014 Aug;50:142-50. doi: 10.1016/j.jbi.2014.01.011. Epub 2014 Feb 3.
5
Inductive creation of an annotation schema and a reference standard for de-identification of VA electronic clinical notes.归纳创建用于对退伍军人事务部电子临床记录进行去识别处理的注释模式和参考标准。
AMIA Annu Symp Proc. 2009 Nov 14;2009:416-20.
6
De-identification of clinical notes in French: towards a protocol for reference corpus development.法语临床记录的去识别化:迈向参考语料库开发协议
J Biomed Inform. 2014 Aug;50:151-61. doi: 10.1016/j.jbi.2013.12.014. Epub 2013 Dec 29.
7
Annotating longitudinal clinical narratives for de-identification: The 2014 i2b2/UTHealth corpus.用于去识别化的纵向临床记录标注:2014年i2b2/德克萨斯大学健康科学中心语料库
J Biomed Inform. 2015 Dec;58 Suppl(Suppl):S20-S29. doi: 10.1016/j.jbi.2015.07.020. Epub 2015 Aug 28.
8
A study of deep learning methods for de-identification of clinical notes in cross-institute settings.深度学习方法在跨机构环境下对临床记录进行去识别的研究。
BMC Med Inform Decis Mak. 2019 Dec 5;19(Suppl 5):232. doi: 10.1186/s12911-019-0935-4.
9
Evaluating current automatic de-identification methods with Veteran's health administration clinical documents.评估退伍军人健康管理局临床文档中当前的自动去识别方法。
BMC Med Res Methodol. 2012 Jul 27;12:109. doi: 10.1186/1471-2288-12-109.
10
Bootstrapping a de-identification system for narrative patient records: cost-performance tradeoffs.为叙事性患者记录构建去识别系统:成本效益权衡。
Int J Med Inform. 2013 Sep;82(9):821-31. doi: 10.1016/j.ijmedinf.2013.03.005. Epub 2013 Apr 30.

引用本文的文献

1
Supporting the working life exposome: Annotating occupational exposure for enhanced literature search.支持工作生活外显子组:对职业暴露进行注释以增强文献检索。
PLoS One. 2024 Aug 15;19(8):e0307844. doi: 10.1371/journal.pone.0307844. eCollection 2024.
2
De-identification of free text data containing personal health information: a scoping review of reviews.去标识化包含个人健康信息的自由文本数据:综述的综述。
Int J Popul Data Sci. 2023 Dec 12;8(1):2153. doi: 10.23889/ijpds.v8i1.2153. eCollection 2023.
3
Using Clinician-Patient WeChat Group Communication Data to Identify Symptom Burdens in Patients With Uterine Fibroids Under Focused Ultrasound Ablation Surgery Treatment: Qualitative Study.

本文引用的文献

1
Natural language processing and the oncologic history: is there a match?自然语言处理与肿瘤病史:是否匹配?
J Oncol Pract. 2011 Jul;7(4):e15-9. doi: 10.1200/JOP.2011.000240.
2
The promise of electronic records: around the corner or down the road?电子记录的前景:近在咫尺还是路途遥远?
JAMA. 2011 Aug 24;306(8):880-1. doi: 10.1001/jama.2011.1219.
3
Automated identification of postoperative complications within an electronic medical record using natural language processing.利用自然语言处理技术在电子病历中自动识别术后并发症。
利用医患微信交流群数据识别聚焦超声消融手术治疗子宫肌瘤患者的症状负担:定性研究
JMIR Form Res. 2023 Sep 1;7:e43995. doi: 10.2196/43995.
4
Supporting COVID-19 Disparity Investigations with Dynamically Adjusting Case Reporting Policies.用动态调整病例报告政策支持 COVID-19 差异调查。
AMIA Annu Symp Proc. 2023 Apr 29;2022:279-288. eCollection 2022.
5
Investigation of the Utility of Features in a Clinical De-identification Model: A Demonstration Using EHR Pathology Reports for Advanced NSCLC Patients.临床去识别模型中特征效用的研究:使用晚期非小细胞肺癌患者的电子健康记录病理报告进行的示范
Front Digit Health. 2022 Feb 16;4:728922. doi: 10.3389/fdgth.2022.728922. eCollection 2022.
6
Privacy Protection and Secondary Use of Health Data: Strategies and Methods.隐私保护与健康数据的二次利用:策略与方法。
Biomed Res Int. 2021 Oct 7;2021:6967166. doi: 10.1155/2021/6967166. eCollection 2021.
7
Improving domain adaptation in de-identification of electronic health records through self-training.通过自训练提高电子健康记录去识别中的领域自适应。
J Am Med Inform Assoc. 2021 Sep 18;28(10):2093-2100. doi: 10.1093/jamia/ocab128.
8
Privacy-Preserving Deep Learning for the Detection of Protected Health Information in Real-World Data: Comparative Evaluation.用于在真实世界数据中检测受保护健康信息的隐私保护深度学习:比较评估
JMIR Form Res. 2020 May 5;4(5):e14064. doi: 10.2196/14064.
9
Protected Health Information filter (Philter): accurately and securely de-identifying free-text clinical notes.受保护的健康信息过滤器(Philter):准确且安全地去除自由文本临床记录中的身份标识信息。
NPJ Digit Med. 2020 Apr 14;3:57. doi: 10.1038/s41746-020-0258-y. eCollection 2020.
10
Detecting conversation topics in primary care office visits from transcripts of patient-provider interactions.从医患互动的转录本中检测初级保健就诊中的对话主题。
J Am Med Inform Assoc. 2019 Dec 1;26(12):1493-1504. doi: 10.1093/jamia/ocz140.
JAMA. 2011 Aug 24;306(8):848-55. doi: 10.1001/jama.2011.1204.
4
A system for de-identifying medical message board text.一个用于去除医疗留言板文本中身份信息的系统。
BMC Bioinformatics. 2011 Jun 9;12 Suppl 3(Suppl 3):S2. doi: 10.1186/1471-2105-12-S3-S2.
5
The MITRE Identification Scrubber Toolkit: design, training, and assessment.MITRE 识别清理工具包:设计、培训和评估。
Int J Med Inform. 2010 Dec;79(12):849-59. doi: 10.1016/j.ijmedinf.2010.09.007. Epub 2010 Oct 14.
6
Extracting medication information from clinical text.从临床文本中提取药物信息。
J Am Med Inform Assoc. 2010 Sep-Oct;17(5):514-8. doi: 10.1136/jamia.2010.003947.
7
Mayo clinical Text Analysis and Knowledge Extraction System (cTAKES): architecture, component evaluation and applications.梅奥临床文本分析和知识提取系统(cTAKES):架构、组件评估和应用。
J Am Med Inform Assoc. 2010 Sep-Oct;17(5):507-13. doi: 10.1136/jamia.2009.001560.
8
Automatic de-identification of textual documents in the electronic health record: a review of recent research.电子健康记录中文本文件的自动去识别:近期研究综述。
BMC Med Res Methodol. 2010 Aug 2;10:70. doi: 10.1186/1471-2288-10-70.
9
What can natural language processing do for clinical decision support?自然语言处理能为临床决策支持做些什么?
J Biomed Inform. 2009 Oct;42(5):760-72. doi: 10.1016/j.jbi.2009.08.007. Epub 2009 Aug 13.
10
Repurposing the clinical record: can an existing natural language processing system de-identify clinical notes?重新利用临床记录:现有的自然语言处理系统能否对临床笔记进行去识别化处理?
J Am Med Inform Assoc. 2009 Jan-Feb;16(1):37-9. doi: 10.1197/jamia.M2862. Epub 2008 Oct 24.