• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

使用基于Transformer的神经网络对不良事件报告中的姓名进行自动编辑。

Automated redaction of names in adverse event reports using transformer-based neural networks.

作者信息

Meldau Eva-Lisa, Bista Shachi, Melgarejo-González Carlos, Norén G Niklas

机构信息

Uppsala Monitoring Centre, Uppsala, Sweden.

出版信息

BMC Med Inform Decis Mak. 2024 Dec 23;24(1):401. doi: 10.1186/s12911-024-02785-9.

DOI:10.1186/s12911-024-02785-9
PMID:39716217
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC11668006/
Abstract

BACKGROUND

Automated recognition and redaction of personal identifiers in free text can enable organisations to share data while protecting privacy. This is important in the context of pharmacovigilance since relevant detailed information on the clinical course of events, differential diagnosis, and patient-reported reflections may often only be conveyed in narrative form. The aim of this study is to develop and evaluate a method for automated redaction of person names in English narrative text on adverse event reports. The target domain for this study was case narratives from the United Kingdom's Yellow Card scheme, which collects and monitors information on suspected side effects to medicines and vaccines.

METHODS

We finetuned BERT - a transformer-based neural network - for recognising names in case narratives. Training data consisted of newly annotated records from the Yellow Card data and of the i2b2 2014 deidentification challenge. Because the Yellow Card data contained few names, we used predictive models to select narratives for training. Performance was evaluated on a separate set of annotated narratives from the Yellow Card scheme. In-depth review determined whether (parts of) person names missed by the de-identification method could enable re-identification of the individual, and whether de-identification reduced the clinical utility of narratives by collaterally masking relevant information.

RESULTS

Recall on held-out Yellow Card data was 87% (155/179) at a precision of 55% (155/282) and a false-positive rate of 0.05% (127/ 263,451). Considering tokens longer than three characters separately, recall was 94% (102/108) and precision 58% (102/175). For 13 of the 5,042 narratives in Yellow Card test data (71 with person names), the method failed to flag at least one name token. According to in-depth review, the leaked information could enable direct identification for one narrative and indirect identification for two narratives. Clinically relevant information was removed in less than 1% of the 5,042 processed narratives; 97% of the narratives were completely untouched.

CONCLUSIONS

Automated redaction of names in free-text narratives of adverse event reports can achieve sufficient recall including shorter tokens like patient initials. In-depth review shows that the rare leaks that occur tend not to compromise patient confidentiality. Precision and false positive rates are acceptable with almost all clinically relevant information retained.

摘要

背景

在自由文本中自动识别和编辑个人标识符,可使组织在保护隐私的同时共享数据。这在药物警戒背景下很重要,因为关于事件临床过程、鉴别诊断以及患者报告的想法等相关详细信息通常只能以叙述形式传达。本研究的目的是开发并评估一种对英文叙述性不良事件报告中的人名进行自动编辑的方法。本研究的目标领域是英国黄卡计划中的病例叙述,该计划收集并监测有关药品和疫苗疑似副作用的信息。

方法

我们对基于变换器的神经网络BERT进行微调,以识别病例叙述中的人名。训练数据包括来自黄卡数据的新注释记录以及i2b2 2014去识别挑战数据。由于黄卡数据中人名较少,我们使用预测模型来选择用于训练的叙述。在另一组来自黄卡计划的注释叙述上评估性能。深入审查确定去识别方法遗漏的(部分)人名是否会导致个人被重新识别,以及去识别是否因附带掩盖相关信息而降低了叙述的临床实用性。

结果

在保留的黄卡数据上,召回率为87%(155/179),精确率为55%(155/282),假阳性率为0.05%(127/263451)。分别考虑长度超过三个字符的词元,召回率为94%(102/108),精确率为58%(102/175)。在黄卡测试数据的5042条叙述(其中71条有人名)中,该方法未能标记至少一个人名词元的有13条。根据深入审查,泄露的信息能够直接识别一条叙述,间接识别两条叙述。在处理的5042条叙述中,不到1%的叙述中临床相关信息被删除;97%的叙述完全未受影响。

结论

在不良事件报告的自由文本叙述中自动编辑人名能够实现足够的召回率,包括像患者姓名首字母这样较短的词元。深入审查表明,出现的罕见信息泄露往往不会损害患者的保密性。精确率和假阳性率是可接受的,几乎所有临床相关信息都得以保留。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/3ee6/11668006/257295eee549/12911_2024_2785_Fig6_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/3ee6/11668006/7b69f3c42bff/12911_2024_2785_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/3ee6/11668006/7976a867542b/12911_2024_2785_Fig2_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/3ee6/11668006/d7eb986c2722/12911_2024_2785_Fig3_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/3ee6/11668006/98a1cb35d156/12911_2024_2785_Fig4_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/3ee6/11668006/bf4f9d8cc2e0/12911_2024_2785_Fig5_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/3ee6/11668006/257295eee549/12911_2024_2785_Fig6_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/3ee6/11668006/7b69f3c42bff/12911_2024_2785_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/3ee6/11668006/7976a867542b/12911_2024_2785_Fig2_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/3ee6/11668006/d7eb986c2722/12911_2024_2785_Fig3_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/3ee6/11668006/98a1cb35d156/12911_2024_2785_Fig4_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/3ee6/11668006/bf4f9d8cc2e0/12911_2024_2785_Fig5_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/3ee6/11668006/257295eee549/12911_2024_2785_Fig6_HTML.jpg

相似文献

1
Automated redaction of names in adverse event reports using transformer-based neural networks.使用基于Transformer的神经网络对不良事件报告中的姓名进行自动编辑。
BMC Med Inform Decis Mak. 2024 Dec 23;24(1):401. doi: 10.1186/s12911-024-02785-9.
2
Evaluation of patient reporting of adverse drug reactions to the UK 'Yellow Card Scheme': literature review, descriptive and qualitative analyses, and questionnaire surveys.评估患者向英国“黄卡计划”报告药物不良反应的情况:文献回顾、描述性和定性分析以及问卷调查。
Health Technol Assess. 2011 May;15(20):1-234, iii-iv. doi: 10.3310/hta15200.
3
Folic acid supplementation and malaria susceptibility and severity among people taking antifolate antimalarial drugs in endemic areas.在流行地区,服用抗叶酸抗疟药物的人群中,叶酸补充剂与疟疾易感性和严重程度的关系。
Cochrane Database Syst Rev. 2022 Feb 1;2(2022):CD014217. doi: 10.1002/14651858.CD014217.
4
Identifying and managing adverse drug reactions: Qualitative analysis of patient reports to the UK yellow card scheme.识别和管理药物不良反应:对英国黄卡计划中患者报告的定性分析。
Br J Clin Pharmacol. 2022 Jul;88(7):3434-3446. doi: 10.1111/bcp.15263. Epub 2022 Mar 23.
5
The pattern of name tokens in narrative clinical text and a comparison of five systems for redacting them.叙事临床文本中命名实体的模式与五种命名实体消歧系统的比较。
J Am Med Inform Assoc. 2014 May-Jun;21(3):423-31. doi: 10.1136/amiajnl-2013-001689. Epub 2013 Sep 11.
6
From narrative descriptions to MedDRA: automagically encoding adverse drug reactions.从叙述性描述到 MedDRA:自动编码药物不良反应。
J Biomed Inform. 2018 Aug;84:184-199. doi: 10.1016/j.jbi.2018.07.001. Epub 2018 Jul 4.
7
Automatic Extraction of Comprehensive Drug Safety Information from Adverse Drug Event Narratives in the Korea Adverse Event Reporting System Using Natural Language Processing Techniques.利用自然语言处理技术从韩国不良事件报告系统的不良药物事件叙述中自动提取全面的药物安全信息。
Drug Saf. 2023 Aug;46(8):781-795. doi: 10.1007/s40264-023-01323-2. Epub 2023 Jun 17.
8
Automated de-identification of free-text medical records.自由文本医疗记录的自动去识别化
BMC Med Inform Decis Mak. 2008 Jul 24;8:32. doi: 10.1186/1472-6947-8-32.
9
Text de-identification for privacy protection: a study of its impact on clinical text information content.用于隐私保护的文本去识别化:对其对临床文本信息内容影响的一项研究
J Biomed Inform. 2014 Aug;50:142-50. doi: 10.1016/j.jbi.2014.01.011. Epub 2014 Feb 3.
10
Implementation and comparison of two text mining methods with a standard pharmacovigilance method for signal detection of medication errors.实施并比较两种文本挖掘方法与一种标准药物警戒方法,用于检测药物错误信号。
BMC Med Inform Decis Mak. 2020 May 24;20(1):94. doi: 10.1186/s12911-020-1097-0.

本文引用的文献

1
De-identifying Norwegian Clinical Text using Resources from Swedish and Danish.使用瑞典语和丹麦语资源对挪威临床文本进行去识别化处理
AMIA Annu Symp Proc. 2024 Jan 11;2023:456-464. eCollection 2023.
2
Automated deidentification of radiology reports combining transformer and "hide in plain sight" rule-based methods.基于 Transformer 和“隐藏在明处”规则的放射学报告自动去识别化。
J Am Med Inform Assoc. 2023 Jan 18;30(2):318-328. doi: 10.1093/jamia/ocac219.
3
De-identifying Australian hospital discharge summaries: An end-to-end framework using ensemble of deep learning models.
去识别澳大利亚住院病历:使用深度学习模型集成的端到端框架。
J Biomed Inform. 2022 Nov;135:104215. doi: 10.1016/j.jbi.2022.104215. Epub 2022 Oct 3.
4
A Novel COVID-19 Data Set and an Effective Deep Learning Approach for the De-Identification of Italian Medical Records.一个用于意大利医疗记录去识别化的新型新冠病毒数据集及有效的深度学习方法。
IEEE Access. 2021 Jan 25;9:19097-19110. doi: 10.1109/ACCESS.2021.3054479. eCollection 2021.
5
The OpenDeID corpus for patient de-identification.OpenDeID 患者去识别语料库。
Sci Rep. 2021 Oct 7;11(1):19973. doi: 10.1038/s41598-021-99554-9.
6
Transferability of neural network clinical deidentification systems.神经网络临床去识别系统的可转移性。
J Am Med Inform Assoc. 2021 Nov 25;28(12):2661-2669. doi: 10.1093/jamia/ocab207.
7
Improving domain adaptation in de-identification of electronic health records through self-training.通过自训练提高电子健康记录去识别中的领域自适应。
J Am Med Inform Assoc. 2021 Sep 18;28(10):2093-2100. doi: 10.1093/jamia/ocab128.
8
Deidentification of free-text medical records using pre-trained bidirectional transformers.使用预训练双向变换器对自由文本医疗记录进行去识别化处理。
Proc ACM Conf Health Inference Learn (2020). 2020 Apr;2020:214-221. doi: 10.1145/3368555.3384455. Epub 2020 Apr 2.
9
Building a best-in-class automated de-identification tool for electronic health records through ensemble learning.通过集成学习构建用于电子健康记录的一流自动去识别工具。
Patterns (N Y). 2021 May 12;2(6):100255. doi: 10.1016/j.patter.2021.100255. eCollection 2021 Jun 11.
10
Protected Health Information filter (Philter): accurately and securely de-identifying free-text clinical notes.受保护的健康信息过滤器(Philter):准确且安全地去除自由文本临床记录中的身份标识信息。
NPJ Digit Med. 2020 Apr 14;3:57. doi: 10.1038/s41746-020-0258-y. eCollection 2020.