• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

基于循环神经网络的临床文本实体识别。

Entity recognition from clinical texts via recurrent neural network.

机构信息

Key Laboratory of Network Oriented Intelligent Computation, Harbin Institute of Technology Shenzhen Graduate School, Shenzhen, 518055, China.

Pharmacy Department, Shenzhen Second People's Hospital, First Affiliated Hospital of Shenzhen University, Shenzhen, 518035, China.

出版信息

BMC Med Inform Decis Mak. 2017 Jul 5;17(Suppl 2):67. doi: 10.1186/s12911-017-0468-7.

DOI:10.1186/s12911-017-0468-7
PMID:28699566
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC5506598/
Abstract

BACKGROUND

Entity recognition is one of the most primary steps for text analysis and has long attracted considerable attention from researchers. In the clinical domain, various types of entities, such as clinical entities and protected health information (PHI), widely exist in clinical texts. Recognizing these entities has become a hot topic in clinical natural language processing (NLP), and a large number of traditional machine learning methods, such as support vector machine and conditional random field, have been deployed to recognize entities from clinical texts in the past few years. In recent years, recurrent neural network (RNN), one of deep learning methods that has shown great potential on many problems including named entity recognition, also has been gradually used for entity recognition from clinical texts.

METHODS

In this paper, we comprehensively investigate the performance of LSTM (long-short term memory), a representative variant of RNN, on clinical entity recognition and protected health information recognition. The LSTM model consists of three layers: input layer - generates representation of each word of a sentence; LSTM layer - outputs another word representation sequence that captures the context information of each word in this sentence; Inference layer - makes tagging decisions according to the output of LSTM layer, that is, outputting a label sequence.

RESULTS

Experiments conducted on corpora of the 2010, 2012 and 2014 i2b2 NLP challenges show that LSTM achieves highest micro-average F1-scores of 85.81% on the 2010 i2b2 medical concept extraction, 92.29% on the 2012 i2b2 clinical event detection, and 94.37% on the 2014 i2b2 de-identification, which is considerably competitive with other state-of-the-art systems.

CONCLUSIONS

LSTM that requires no hand-crafted feature has great potential on entity recognition from clinical texts. It outperforms traditional machine learning methods that suffer from fussy feature engineering. A possible future direction is how to integrate knowledge bases widely existing in the clinical domain into LSTM, which is a case of our future work. Moreover, how to use LSTM to recognize entities in specific formats is also another possible future direction.

摘要

背景

实体识别是文本分析中最基本的步骤之一,长期以来一直受到研究人员的关注。在临床领域,各种类型的实体,如临床实体和受保护的健康信息(PHI),广泛存在于临床文本中。识别这些实体已成为临床自然语言处理(NLP)中的一个热门话题,近年来,已经有许多传统的机器学习方法,如支持向量机和条件随机场,被用于从临床文本中识别实体。近年来,深度学习方法中的一种——递归神经网络(RNN),在命名实体识别等问题上表现出了巨大的潜力,也逐渐被用于从临床文本中识别实体。

方法

在本文中,我们全面研究了 LSTM(长短时记忆)作为 RNN 的一个代表变体在临床实体识别和保护健康信息识别方面的性能。LSTM 模型由三个层组成:输入层-生成句子中每个单词的表示;LSTM 层-输出另一个单词表示序列,捕获该句子中每个单词的上下文信息;推断层-根据 LSTM 层的输出做出标记决策,即输出标签序列。

结果

在 2010、2012 和 2014 年 i2b2 NLP 挑战赛的语料库上进行的实验表明,LSTM 在 2010 年 i2b2 医学概念提取中获得了 85.81%的微平均 F1 分数,在 2012 年 i2b2 临床事件检测中获得了 92.29%的微平均 F1 分数,在 2014 年 i2b2 去识别中获得了 94.37%的微平均 F1 分数,这与其他最先进的系统相当具有竞争力。

结论

不需要手工制作特征的 LSTM 在从临床文本中识别实体方面具有很大的潜力。它优于传统的机器学习方法,这些方法在特征工程方面存在繁琐的问题。一个可能的未来方向是如何将临床领域广泛存在的知识库集成到 LSTM 中,这是我们未来工作的一个方面。此外,如何使用 LSTM 识别特定格式的实体也是另一个可能的未来方向。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2b58/5506598/78e7ddab76a4/12911_2017_468_Fig3_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2b58/5506598/a0c9414760cf/12911_2017_468_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2b58/5506598/7819b0acb064/12911_2017_468_Fig2_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2b58/5506598/78e7ddab76a4/12911_2017_468_Fig3_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2b58/5506598/a0c9414760cf/12911_2017_468_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2b58/5506598/7819b0acb064/12911_2017_468_Fig2_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2b58/5506598/78e7ddab76a4/12911_2017_468_Fig3_HTML.jpg

相似文献

1
Entity recognition from clinical texts via recurrent neural network.基于循环神经网络的临床文本实体识别。
BMC Med Inform Decis Mak. 2017 Jul 5;17(Suppl 2):67. doi: 10.1186/s12911-017-0468-7.
2
Temporal indexing of medical entity in Chinese clinical notes.中文临床记录中医疗实体的时间索引。
BMC Med Inform Decis Mak. 2019 Jan 31;19(Suppl 1):17. doi: 10.1186/s12911-019-0735-x.
3
Long short-term memory RNN for biomedical named entity recognition.用于生物医学命名实体识别的长短期记忆循环神经网络
BMC Bioinformatics. 2017 Oct 30;18(1):462. doi: 10.1186/s12859-017-1868-5.
4
A deep learning model incorporating part of speech and self-matching attention for named entity recognition of Chinese electronic medical records.基于词性和自匹配注意力的深度学习模型在中文电子病历命名实体识别中的应用。
BMC Med Inform Decis Mak. 2019 Apr 9;19(Suppl 2):65. doi: 10.1186/s12911-019-0762-7.
5
Entity recognition in Chinese clinical text using attention-based CNN-LSTM-CRF.基于注意力机制的卷积神经网络-长短时记忆网络-条件随机场在中文临床文本中的实体识别。
BMC Med Inform Decis Mak. 2019 Apr 4;19(Suppl 3):74. doi: 10.1186/s12911-019-0787-y.
6
Clinical Named Entity Recognition Using Deep Learning Models.使用深度学习模型的临床命名实体识别
AMIA Annu Symp Proc. 2018 Apr 16;2017:1812-1819. eCollection 2017.
7
Adverse Drug Event Detection from Electronic Health Records Using Hierarchical Recurrent Neural Networks with Dual-Level Embedding.基于具有双层嵌入的层次递归神经网络从电子健康记录中检测药物不良反应。
Drug Saf. 2019 Jan;42(1):113-122. doi: 10.1007/s40264-018-0765-9.
8
De-identification of clinical notes via recurrent neural network and conditional random field.通过递归神经网络和条件随机场对临床记录进行去识别。
J Biomed Inform. 2017 Nov;75S:S34-S42. doi: 10.1016/j.jbi.2017.05.023. Epub 2017 Jun 1.
9
De-identification of Clinical Text via Bi-LSTM-CRF with Neural Language Models.基于神经语言模型的双向长短时记忆条件随机场实现临床文本去识别化
AMIA Annu Symp Proc. 2020 Mar 4;2019:857-863. eCollection 2019.
10
Improving the Named Entity Recognition of Chinese Electronic Medical Records by Combining Domain Dictionary and Rules.通过结合领域字典和规则来提高中文电子病历的命名实体识别。
Int J Environ Res Public Health. 2020 Apr 14;17(8):2687. doi: 10.3390/ijerph17082687.

引用本文的文献

1
Enhancing Cross-Domain Generalizability in Social Determinants of Health Extraction with Prompt-Tuning Large Language Models.利用提示调整大型语言模型增强健康提取社会决定因素中的跨领域通用性。
AMIA Jt Summits Transl Sci Proc. 2025 Jun 10;2025:432-440. eCollection 2025.
2
A refined set of RxNorm drug names for enhancing unstructured data analysis in drug safety surveillance.一组经过优化的RxNorm药物名称,用于加强药物安全监测中的非结构化数据分析。
Exp Biol Med (Maywood). 2025 May 2;250:10374. doi: 10.3389/ebm.2025.10374. eCollection 2025.
3
Discovering patient groups in sequential electronic healthcare data using unsupervised representation learning.

本文引用的文献

1
De-identification of patient notes with recurrent neural networks.使用递归神经网络对患者记录进行去识别化处理。
J Am Med Inform Assoc. 2017 May 1;24(3):596-606. doi: 10.1093/jamia/ocw156.
2
Structured prediction models for RNN based sequence labeling in clinical text.用于临床文本中基于循环神经网络的序列标注的结构化预测模型。
Proc Conf Empir Methods Nat Lang Process. 2016 Nov;2016:856-865. doi: 10.18653/v1/d16-1082.
3
Bidirectional RNN for Medical Event Detection in Electronic Health Records.用于电子健康记录中医疗事件检测的双向循环神经网络
使用无监督表示学习在序贯电子医疗数据中发现患者群体。
BMC Med Inform Decis Mak. 2025 Jan 28;25(1):45. doi: 10.1186/s12911-024-02812-9.
4
Predicting patients' sentiments about medications using artificial intelligence techniques.使用人工智能技术预测患者对药物的看法。
Sci Rep. 2024 Dec 30;14(1):31928. doi: 10.1038/s41598-024-83222-9.
5
Task-Specific Transformer-Based Language Models in Health Care: Scoping Review.基于任务特定的转换器的语言模型在医疗保健中的应用:范围综述。
JMIR Med Inform. 2024 Nov 18;12:e49724. doi: 10.2196/49724.
6
A Case Demonstration of the Open Health Natural Language Processing Toolkit From the National COVID-19 Cohort Collaborative and the Researching COVID to Enhance Recovery Programs for a Natural Language Processing System for COVID-19 or Postacute Sequelae of SARS CoV-2 Infection: Algorithm Development and Validation.来自国家新冠病毒队列协作组的开放健康自然语言处理工具包案例演示以及为新冠病毒感染或新冠后综合征增强恢复计划而开展的新冠病毒自然语言处理系统研究:算法开发与验证
JMIR Med Inform. 2024 Sep 9;12:e49997. doi: 10.2196/49997.
7
Extracting Pulmonary Nodules and Nodule Characteristics from Radiology Reports of Lung Cancer Screening Patients Using Transformer Models.使用Transformer模型从肺癌筛查患者的放射学报告中提取肺结节及结节特征
J Healthc Inform Res. 2024 May 17;8(3):463-477. doi: 10.1007/s41666-024-00166-5. eCollection 2024 Sep.
8
Explainable text-tabular models for predicting mortality risk in companion animals.用于预测伴侣动物死亡风险的可解释文本-表格模型。
Sci Rep. 2024 Jun 20;14(1):14217. doi: 10.1038/s41598-024-64551-1.
9
Named Entity Recognition in Electronic Health Records: A Methodological Review.电子健康记录中的命名实体识别:方法学综述
Healthc Inform Res. 2023 Oct;29(4):286-300. doi: 10.4258/hir.2023.29.4.286. Epub 2023 Oct 31.
10
CLART: A cascaded lattice-and-radical transformer network for Chinese medical named entity recognition.CLART:一种用于中文医学命名实体识别的级联格与激进变压器网络。
Heliyon. 2023 Oct 10;9(10):e20692. doi: 10.1016/j.heliyon.2023.e20692. eCollection 2023 Oct.
Proc Conf. 2016 Jun;2016:473-482. doi: 10.18653/v1/n16-1056.
4
A Study of Concept Extraction Across Different Types of Clinical Notes.不同类型临床记录中的概念提取研究。
AMIA Annu Symp Proc. 2015 Nov 5;2015:737-46. eCollection 2015.
5
CRFs based de-identification of medical records.基于病例报告表的医疗记录去识别化处理。
J Biomed Inform. 2015 Dec;58 Suppl(Suppl):S39-S46. doi: 10.1016/j.jbi.2015.08.012. Epub 2015 Aug 24.
6
Automatic detection of protected health information from clinic narratives.从临床记录中自动检测受保护的健康信息。
J Biomed Inform. 2015 Dec;58 Suppl(Suppl):S30-S38. doi: 10.1016/j.jbi.2015.06.015. Epub 2015 Jul 29.
7
Automated systems for the de-identification of longitudinal clinical narratives: Overview of 2014 i2b2/UTHealth shared task Track 1.用于纵向临床记录去识别化的自动化系统:2014年i2b2/德克萨斯大学健康科学中心共享任务赛道1概述
J Biomed Inform. 2015 Dec;58 Suppl(Suppl):S11-S19. doi: 10.1016/j.jbi.2015.06.007. Epub 2015 Jul 28.
8
Combining knowledge- and data-driven methods for de-identification of clinical narratives.结合知识驱动和数据驱动方法对临床记录进行去识别化处理。
J Biomed Inform. 2015 Dec;58 Suppl(Suppl):S53-S59. doi: 10.1016/j.jbi.2015.06.029. Epub 2015 Jul 22.
9
Automatic de-identification of electronic medical records using token-level and character-level conditional random fields.使用令牌级和字符级条件随机场对电子病历进行自动去识别。
J Biomed Inform. 2015 Dec;58 Suppl(Suppl):S47-S52. doi: 10.1016/j.jbi.2015.06.009. Epub 2015 Jun 26.
10
Combining rules and machine learning for extraction of temporal expressions and events from clinical narratives.从临床叙述中提取时间表达式和事件的规则与机器学习相结合。
J Am Med Inform Assoc. 2013 Sep-Oct;20(5):859-66. doi: 10.1136/amiajnl-2013-001625. Epub 2013 Apr 20.