• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

相似文献

1
De-identification of Clinical Text via Bi-LSTM-CRF with Neural Language Models.基于神经语言模型的双向长短时记忆条件随机场实现临床文本去识别化
AMIA Annu Symp Proc. 2020 Mar 4;2019:857-863. eCollection 2019.
2
De-identification of medical records using conditional random fields and long short-term memory networks.使用条件随机场和长短时记忆网络对病历进行去识别。
J Biomed Inform. 2017 Nov;75S:S43-S53. doi: 10.1016/j.jbi.2017.10.003. Epub 2017 Oct 13.
3
Entity recognition from clinical texts via recurrent neural network.基于循环神经网络的临床文本实体识别。
BMC Med Inform Decis Mak. 2017 Jul 5;17(Suppl 2):67. doi: 10.1186/s12911-017-0468-7.
4
De-identification of clinical notes via recurrent neural network and conditional random field.通过递归神经网络和条件随机场对临床记录进行去识别。
J Biomed Inform. 2017 Nov;75S:S34-S42. doi: 10.1016/j.jbi.2017.05.023. Epub 2017 Jun 1.
5
De-identifying free text of Japanese electronic health records.去标识化日本电子健康记录的自由文本。
J Biomed Semantics. 2020 Sep 21;11(1):11. doi: 10.1186/s13326-020-00227-9.
6
Entity recognition in Chinese clinical text using attention-based CNN-LSTM-CRF.基于注意力机制的卷积神经网络-长短时记忆网络-条件随机场在中文临床文本中的实体识别。
BMC Med Inform Decis Mak. 2019 Apr 4;19(Suppl 3):74. doi: 10.1186/s12911-019-0787-y.
7
Extracting comprehensive clinical information for breast cancer using deep learning methods.利用深度学习方法提取乳腺癌全面临床信息。
Int J Med Inform. 2019 Dec;132:103985. doi: 10.1016/j.ijmedinf.2019.103985. Epub 2019 Oct 2.
8
Korean clinical entity recognition from diagnosis text using BERT.基于 BERT 的韩语文本临床实体识别。
BMC Med Inform Decis Mak. 2020 Sep 30;20(Suppl 7):242. doi: 10.1186/s12911-020-01241-8.
9
Automatic de-identification of French electronic health records: a cost-effective approach exploiting distant supervision and deep learning models.自动去除法国电子健康记录中的标识符:一种利用远程监督和深度学习模型的具有成本效益的方法。
BMC Med Inform Decis Mak. 2024 Feb 16;24(1):54. doi: 10.1186/s12911-024-02422-5.
10
Evaluation of clinical named entity recognition methods for Serbian electronic health records.评估塞尔维亚电子健康记录中的临床命名实体识别方法。
Int J Med Inform. 2022 Aug;164:104805. doi: 10.1016/j.ijmedinf.2022.104805. Epub 2022 May 25.

引用本文的文献

1
A Transformer-Based Pipeline for German Clinical Document De-Identification.一种基于Transformer的德国临床文档去识别管道。
Appl Clin Inform. 2025 Jan;16(1):31-43. doi: 10.1055/a-2424-1989. Epub 2025 Jan 8.
2
Automatic de-identification of French electronic health records: a cost-effective approach exploiting distant supervision and deep learning models.自动去除法国电子健康记录中的标识符:一种利用远程监督和深度学习模型的具有成本效益的方法。
BMC Med Inform Decis Mak. 2024 Feb 16;24(1):54. doi: 10.1186/s12911-024-02422-5.
3
Identification of runner fatigue stages based on inertial sensors and deep learning.基于惯性传感器和深度学习的跑步者疲劳阶段识别
Front Bioeng Biotechnol. 2023 Nov 17;11:1302911. doi: 10.3389/fbioe.2023.1302911. eCollection 2023.
4
Web-Based Application Based on Human-in-the-Loop Deep Learning for Deidentifying Free-Text Data in Electronic Medical Records: Development and Usability Study.基于人在回路深度学习的电子病历自由文本数据去识别化的网络应用程序:开发与可用性研究
Interact J Med Res. 2023 Aug 25;12:e46322. doi: 10.2196/46322.
5
Exploring the effects of drug, disease, and protein dependencies on biomedical named entity recognition: A comparative analysis.探索药物、疾病和蛋白质依赖性对生物医学命名实体识别的影响:一项比较分析。
Front Pharmacol. 2022 Dec 21;13:1020759. doi: 10.3389/fphar.2022.1020759. eCollection 2022.
6
An Efficient Method for Deidentifying Protected Health Information in Chinese Electronic Health Records: Algorithm Development and Validation.一种在中国电子健康记录中去识别受保护健康信息的有效方法:算法开发与验证
JMIR Med Inform. 2022 Aug 30;10(8):e38154. doi: 10.2196/38154.
7
Improving domain adaptation in de-identification of electronic health records through self-training.通过自训练提高电子健康记录去识别中的领域自适应。
J Am Med Inform Assoc. 2021 Sep 18;28(10):2093-2100. doi: 10.1093/jamia/ocab128.
8
Crosslingual named entity recognition for clinical de-identification applied to a COVID-19 Italian data set.应用于COVID-19意大利语数据集的临床去识别跨语言命名实体识别
Appl Soft Comput. 2020 Dec;97:106779. doi: 10.1016/j.asoc.2020.106779. Epub 2020 Oct 9.

本文引用的文献

1
Ensemble-based Methods to Improve De-identification of Electronic Health Record Narratives.基于集成的方法以改善电子健康记录叙述的去识别化
AMIA Annu Symp Proc. 2018 Dec 5;2018:663-672. eCollection 2018.
2
Leveraging text skeleton for de-identification of electronic medical records.利用文本骨架对电子病历进行去识别化。
BMC Med Inform Decis Mak. 2018 Mar 22;18(Suppl 1):18. doi: 10.1186/s12911-018-0598-6.
3
Entity recognition from clinical texts via recurrent neural network.基于循环神经网络的临床文本实体识别。
BMC Med Inform Decis Mak. 2017 Jul 5;17(Suppl 2):67. doi: 10.1186/s12911-017-0468-7.
4
De-identification of psychiatric intake records: Overview of 2016 CEGS N-GRID shared tasks Track 1.去识别精神科入院记录:2016 年 CEGS N-GRID 共享任务跟踪 1 概述。
J Biomed Inform. 2017 Nov;75S:S4-S18. doi: 10.1016/j.jbi.2017.06.011. Epub 2017 Jun 11.
5
De-identification of clinical notes via recurrent neural network and conditional random field.通过递归神经网络和条件随机场对临床记录进行去识别。
J Biomed Inform. 2017 Nov;75S:S34-S42. doi: 10.1016/j.jbi.2017.05.023. Epub 2017 Jun 1.
6
De-identification of patient notes with recurrent neural networks.使用递归神经网络对患者记录进行去识别化处理。
J Am Med Inform Assoc. 2017 May 1;24(3):596-606. doi: 10.1093/jamia/ocw156.
7
Practical applications for natural language processing in clinical research: The 2014 i2b2/UTHealth shared tasks.自然语言处理在临床研究中的实际应用:2014年i2b2/德克萨斯大学健康科学中心共享任务
J Biomed Inform. 2015 Dec;58 Suppl(Suppl):S1-S5. doi: 10.1016/j.jbi.2015.10.007. Epub 2015 Oct 24.
8
Annotating longitudinal clinical narratives for de-identification: The 2014 i2b2/UTHealth corpus.用于去识别化的纵向临床记录标注:2014年i2b2/德克萨斯大学健康科学中心语料库
J Biomed Inform. 2015 Dec;58 Suppl(Suppl):S20-S29. doi: 10.1016/j.jbi.2015.07.020. Epub 2015 Aug 28.
9
Health Insurance Portability and Accountability Act of 1996. Public Law 104-191.1996年《健康保险流通与责任法案》。公法第104 - 191号。
US Statut Large. 1996 Aug 21;110:1936-2103.
10
Long short-term memory.长短期记忆
Neural Comput. 1997 Nov 15;9(8):1735-80. doi: 10.1162/neco.1997.9.8.1735.

基于神经语言模型的双向长短时记忆条件随机场实现临床文本去识别化

De-identification of Clinical Text via Bi-LSTM-CRF with Neural Language Models.

作者信息

Tang Buzhou, Jiang Dehuan, Chen Qingcai, Wang Xiaolong, Yan Jun, Shen Ying

机构信息

Key Laboratory of Network Oriented Intelligent Computation, Harbin Institute of Tech-nology, Shenzhen, China.

Corresponding author:

出版信息

AMIA Annu Symp Proc. 2020 Mar 4;2019:857-863. eCollection 2019.

PMID:32308882
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC7153082/
Abstract

De-identification of clinical text, the prerequisite of electronic clinical data reuse, is a typical named entity recogni tion (NER) problem. A number of state-of-the-art deep learning methods for NER, such as Bi-LSTM-CRF (bidirec tional long-short-term-memory conditional random fields), have been applied for de-identification. Neural language models used for language representation bring great improvement in lots of NLP tasks when they are integrated with other deep learning methods. In this paper, we introduce Bi-LSTM-CRF with neural language models for de- identification of clinical text, and evaluate it on the de-identification datasets of the i2b2 2014 and the CEGS N- GRID 2016 challenges. Four neural language models of three types individually integrated with Bi-LSTM-CRF are compared in this study. Bi-LSTM-CRF with neural language models achieves the highest "strict" micro-averaged F1-score of 95.50% on the i2b2 2014 dataset and 91.82% on the CEGS N-GRID 2016 dataset, becoming new benchmark results on these two datasets respectively De-identification, Named entity recognition, Bidirectional long-short-term-memory, Conditional ran dom fields, Neural language models.

摘要

临床文本去识别化是电子临床数据复用的前提,是一个典型的命名实体识别(NER)问题。许多用于NER的先进深度学习方法,如双向长短期记忆条件随机场(Bi-LSTM-CRF),已被应用于去识别化。当神经语言模型与其他深度学习方法集成时,用于语言表示的神经语言模型在许多自然语言处理任务中带来了很大的改进。在本文中,我们引入了结合神经语言模型的Bi-LSTM-CRF用于临床文本的去识别化,并在i2b2 2014和CEGS N-GRID 2016挑战的去识别化数据集上对其进行评估。本研究比较了三种类型的四个神经语言模型分别与Bi-LSTM-CRF的集成情况。结合神经语言模型的Bi-LSTM-CRF在i2b2 2014数据集上实现了最高的“严格”微观平均F1分数,为95.50%,在CEGS N-GRID 2016数据集上为91.82%,分别成为这两个数据集上的新基准结果。去识别化、命名实体识别、双向长短期记忆、条件随机场、神经语言模型。