• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

考察预训练去识别变换模型在叙事护理记录上的泛化能力。

Examining the Generalizability of Pretrained De-identification Transformer Models on Narrative Nursing Notes.

机构信息

Department of Biomedical Informatics, Columbia University, New York, New York, United States.

School of Nursing, University of Pennsylvania, Philadelphia, Pennsylvania, United States.

出版信息

Appl Clin Inform. 2024 Mar;15(2):357-367. doi: 10.1055/a-2282-4340. Epub 2024 Mar 6.

DOI:10.1055/a-2282-4340
PMID:38447965
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC11078567/
Abstract

BACKGROUND

Narrative nursing notes are a valuable resource in informatics research with unique predictive signals about patient care. The open sharing of these data, however, is appropriately constrained by rigorous regulations set by the Health Insurance Portability and Accountability Act (HIPAA) for the protection of privacy. Several models have been developed and evaluated on the open-source i2b2 dataset. A focus on the generalizability of these models with respect to nursing notes remains understudied.

OBJECTIVES

The study aims to understand the generalizability of pretrained transformer models and investigate the variability of personal protected health information (PHI) distribution patterns between discharge summaries and nursing notes with a goal to inform the future design for model evaluation schema.

METHODS

Two pretrained transformer models (RoBERTa, ClinicalBERT) fine-tuned on i2b2 2014 discharge summaries were evaluated on our data inpatient nursing notes and compared with the baseline performance. Statistical testing was deployed to assess differences in PHI distribution across discharge summaries and nursing notes.

RESULTS

RoBERTa achieved the optimal performance when tested on an external source of data, with an F1 score of 0.887 across PHI categories and 0.932 in the PHI binary task. Overall, discharge summaries contained a higher number of PHI instances and categories of PHI compared with inpatient nursing notes.

CONCLUSION

The study investigated the applicability of two pretrained transformers on inpatient nursing notes and examined the distinctions between nursing notes and discharge summaries concerning the utilization of personal PHI. Discharge summaries presented a greater quantity of PHI instances and types when compared with narrative nursing notes, but narrative nursing notes exhibited more diversity in the types of PHI present, with some pertaining to patient's personal life. The insights obtained from the research help improve the design and selection of algorithms, as well as contribute to the development of suitable performance thresholds for PHI.

摘要

背景

叙事护理记录是信息学研究中的宝贵资源,具有独特的预测患者护理的信号。然而,由于《健康保险流通与责任法案》(HIPAA)对隐私保护的严格规定,这些数据的开放共享受到了适当的限制。已经在开源 i2b2 数据集上开发和评估了几种模型。然而,这些模型在护理记录方面的通用性仍然研究不足。

目的

本研究旨在了解预训练的转换器模型的通用性,并研究出院小结和护理记录之间个人保护健康信息(PHI)分布模式的可变性,旨在为未来的模型评估方案设计提供信息。

方法

对 i2b2 2014 年出院小结进行微调的两个预训练的转换器模型(RoBERTa、ClinicalBERT)在我们的数据住院护理记录上进行了评估,并与基线性能进行了比较。统计测试被用来评估 PHI 分布在出院小结和护理记录之间的差异。

结果

当在外部数据源上进行测试时,RoBERTa 达到了最佳性能,在 PHI 类别上的 F1 分数为 0.887,在 PHI 二进制任务中的分数为 0.932。总体而言,与住院护理记录相比,出院小结包含更多的 PHI 实例和 PHI 类别。

结论

本研究调查了两种预训练的转换器在住院护理记录上的适用性,并研究了护理记录和出院小结在个人 PHI 使用方面的区别。与叙事护理记录相比,出院小结呈现出更多的 PHI 实例和类型,但叙事护理记录在存在的 PHI 类型上表现出更多的多样性,其中一些与患者的个人生活有关。研究获得的见解有助于改进算法的设计和选择,并为 PHI 的适当性能阈值的制定做出贡献。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d3ce/11078567/aaddaea38348/10-1055-a-2282-4340-i202310ra0214-4.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d3ce/11078567/b409410c5a1e/10-1055-a-2282-4340-i202310ra0214-1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d3ce/11078567/774d0cdc82aa/10-1055-a-2282-4340-i202310ra0214-2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d3ce/11078567/3ecc9c74130a/10-1055-a-2282-4340-i202310ra0214-3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d3ce/11078567/aaddaea38348/10-1055-a-2282-4340-i202310ra0214-4.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d3ce/11078567/b409410c5a1e/10-1055-a-2282-4340-i202310ra0214-1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d3ce/11078567/774d0cdc82aa/10-1055-a-2282-4340-i202310ra0214-2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d3ce/11078567/3ecc9c74130a/10-1055-a-2282-4340-i202310ra0214-3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d3ce/11078567/aaddaea38348/10-1055-a-2282-4340-i202310ra0214-4.jpg

相似文献

1
Examining the Generalizability of Pretrained De-identification Transformer Models on Narrative Nursing Notes.考察预训练去识别变换模型在叙事护理记录上的泛化能力。
Appl Clin Inform. 2024 Mar;15(2):357-367. doi: 10.1055/a-2282-4340. Epub 2024 Mar 6.
2
Automated de-identification of free-text medical records.自由文本医疗记录的自动去识别化
BMC Med Inform Decis Mak. 2008 Jul 24;8:32. doi: 10.1186/1472-6947-8-32.
3
Preparing an annotated gold standard corpus to share with extramural investigators for de-identification research.准备一个带注释的金标准语料库,以便与校外研究人员共享用于去识别化研究。
J Biomed Inform. 2014 Aug;50:173-183. doi: 10.1016/j.jbi.2014.01.014. Epub 2014 Feb 17.
4
Protected Health Information Recognition by Fine-Tuning a Pre-training Transformer Model.通过微调预训练的Transformer模型来识别受保护的健康信息。
Healthc Inform Res. 2022 Jan;28(1):16-24. doi: 10.4258/hir.2022.28.1.16. Epub 2022 Jan 31.
5
Unlocking the Secrets Behind Advanced Artificial Intelligence Language Models in Deidentifying Chinese-English Mixed Clinical Text: Development and Validation Study.揭开高级人工智能语言模型在去识别汉英混合临床文本背后的秘密:开发与验证研究。
J Med Internet Res. 2024 Jan 25;26:e48443. doi: 10.2196/48443.
6
Location bias of identifiers in clinical narratives.临床叙述中标识符的位置偏差。
AMIA Annu Symp Proc. 2013 Nov 16;2013:560-9. eCollection 2013.
7
Clinical concept extraction using transformers.使用转换器进行临床概念提取。
J Am Med Inform Assoc. 2020 Dec 9;27(12):1935-1942. doi: 10.1093/jamia/ocaa189.
8
Predicting early psychiatric readmission with natural language processing of narrative discharge summaries.通过对出院小结进行自然语言处理预测早期精神科再入院情况。
Transl Psychiatry. 2016 Oct 18;6(10):e921. doi: 10.1038/tp.2015.182.
9
OpenDeID Pipeline for Unstructured Electronic Health Record Text Notes Based on Rules and Transformers: Deidentification Algorithm Development and Validation Study.基于规则和转换器的非结构化电子健康记录文本注释的 OpenDeID 管道:去识别算法的开发和验证研究。
J Med Internet Res. 2023 Dec 6;25:e48145. doi: 10.2196/48145.
10
Assessing the difficulty and time cost of de-identification in clinical narratives.评估临床记录中去识别化的难度和时间成本。
Methods Inf Med. 2006;45(3):246-52.

引用本文的文献

1
Toward Identifying New Risk Aversions and Subsequent Limitations and Biases When Making De-identified Structured Data Sets Openly Available in a Post-LLM world.在大语言模型时代,当使去识别化的结构化数据集公开可用时,致力于识别新的风险规避以及随之而来的限制和偏差。
AMIA Annu Symp Proc. 2025 May 22;2024:262-270. eCollection 2024.
2
A Transformer-Based Pipeline for German Clinical Document De-Identification.一种基于Transformer的德国临床文档去识别管道。
Appl Clin Inform. 2025 Jan;16(1):31-43. doi: 10.1055/a-2424-1989. Epub 2025 Jan 8.
3
Toward Clinical Generative AI: Conceptual Framework.

本文引用的文献

1
Automated deidentification of radiology reports combining transformer and "hide in plain sight" rule-based methods.基于 Transformer 和“隐藏在明处”规则的放射学报告自动去识别化。
J Am Med Inform Assoc. 2023 Jan 18;30(2):318-328. doi: 10.1093/jamia/ocac219.
2
Data Sharing and Global Public Health: Defining What We Mean by Data.数据共享与全球公共卫生:界定我们所说的数据的含义
Front Digit Health. 2020 Dec 14;2:612339. doi: 10.3389/fdgth.2020.612339. eCollection 2020.
3
Deidentification of free-text medical records using pre-trained bidirectional transformers.
迈向临床生成式人工智能:概念框架
JMIR AI. 2024 Jun 7;3:e55957. doi: 10.2196/55957.
4
An Extensible Evaluation Framework Applied to Clinical Text Deidentification Natural Language Processing Tools: Multisystem and Multicorpus Study.应用于临床文本去标识化自然语言处理工具的可扩展评估框架:多系统和多语料库研究。
J Med Internet Res. 2024 May 28;26:e55676. doi: 10.2196/55676.
使用预训练双向变换器对自由文本医疗记录进行去识别化处理。
Proc ACM Conf Health Inference Learn (2020). 2020 Apr;2020:214-221. doi: 10.1145/3368555.3384455. Epub 2020 Apr 2.
4
Healthcare Process Modeling to Phenotype Clinician Behaviors for Exploiting the Signal Gain of Clinical Expertise (HPM-ExpertSignals): Development and evaluation of a conceptual framework.医疗保健流程建模以表型临床医生行为,以利用临床专业知识的信号增益(HPM-ExpertSignals):概念框架的开发和评估。
J Am Med Inform Assoc. 2021 Jun 12;28(6):1242-1251. doi: 10.1093/jamia/ocab006.
5
Protected Health Information filter (Philter): accurately and securely de-identifying free-text clinical notes.受保护的健康信息过滤器(Philter):准确且安全地去除自由文本临床记录中的身份标识信息。
NPJ Digit Med. 2020 Apr 14;3:57. doi: 10.1038/s41746-020-0258-y. eCollection 2020.
6
Customization scenarios for de-identification of clinical notes.临床记录去识别的定制化场景。
BMC Med Inform Decis Mak. 2020 Jan 30;20(1):14. doi: 10.1186/s12911-020-1026-2.
7
Mining clinical phrases from nursing notes to discover risk factors of patient deterioration.从护理记录中挖掘临床短语,以发现患者恶化的风险因素。
Int J Med Inform. 2020 Mar;135:104053. doi: 10.1016/j.ijmedinf.2019.104053. Epub 2019 Dec 14.
8
Identifying nurses' concern concepts about patient deterioration using a standard nursing terminology.使用标准护理术语识别护士对患者恶化的关注概念。
Int J Med Inform. 2020 Jan;133:104016. doi: 10.1016/j.ijmedinf.2019.104016. Epub 2019 Oct 31.
9
Managing Unstructured Big Data in Healthcare System.医疗系统中无结构大数据的管理
Healthc Inform Res. 2019 Jan;25(1):1-2. doi: 10.4258/hir.2019.25.1.1. Epub 2019 Jan 31.
10
Electronic health record adoption in US hospitals: the emergence of a digital "advanced use" divide.美国医院采用电子健康记录:数字“高级使用”鸿沟的出现。
J Am Med Inform Assoc. 2017 Nov 1;24(6):1142-1148. doi: 10.1093/jamia/ocx080.