• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

基于词性和自匹配注意力的深度学习模型在中文电子病历命名实体识别中的应用。

A deep learning model incorporating part of speech and self-matching attention for named entity recognition of Chinese electronic medical records.

机构信息

Communication & Computer Network Lab of Guangdong, School of Computer Science and Engineering, South China University of Technology, Guangzhou, China.

出版信息

BMC Med Inform Decis Mak. 2019 Apr 9;19(Suppl 2):65. doi: 10.1186/s12911-019-0762-7.

DOI:10.1186/s12911-019-0762-7
PMID:30961622
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC6454585/
Abstract

BACKGROUND

The Named Entity Recognition (NER) task as a key step in the extraction of health information, has encountered many challenges in Chinese Electronic Medical Records (EMRs). Firstly, the casual use of Chinese abbreviations and doctors' personal style may result in multiple expressions of the same entity, and we lack a common Chinese medical dictionary to perform accurate entity extraction. Secondly, the electronic medical record contains entities from a variety of categories of entities, and the length of those entities in different categories varies greatly, which increases the difficult in the extraction for the Chinese NER. Therefore, the entity boundary detection becomes the key to perform accurate entity extraction of Chinese EMRs, and we need to develop a model that supports multiple length entity recognition without relying on any medical dictionary.

METHODS

In this study, we incorporate part-of-speech (POS) information into the deep learning model to improve the accuracy of Chinese entity boundary detection. In order to avoid the wrongly POS tagging of long entities, we proposed a method called reduced POS tagging that reserves the tags of general words but not of the seemingly medical entities. The model proposed in this paper, named SM-LSTM-CRF, consists of three layers: self-matching attention layer - calculating the relevance of each character to the entire sentence; LSTM (Long Short-Term Memory) layer - capturing the context feature of each character; CRF (Conditional Random Field) layer - labeling characters based on their features and transfer rules.

RESULTS

The experimental results at a Chinese EMRs dataset show that the F1 value of SM-LSTM-CRF is increased by 2.59% compared to that of the LSTM-CRF. After adding POS feature in the model, we get an improvement of about 7.74% at F1. The reduced POS tagging reduces the false tagging on long entities, thus increases the F1 value by 2.42% and achieves an F1 score of 80.07%.

CONCLUSIONS

The POS feature marked by the reduced POS tagging together with self-matching attention mechanism puts a stranglehold on entity boundaries and has a good performance in the recognition of clinical entities.

摘要

背景

命名实体识别(NER)任务作为提取健康信息的关键步骤,在中文电子病历(EMR)中遇到了许多挑战。首先,随意使用中文缩写和医生的个人风格可能会导致同一个实体有多种表达方式,并且我们缺乏一个通用的中文医学词典来进行准确的实体提取。其次,电子病历中包含来自各种实体类别的实体,并且不同类别的实体长度差异很大,这增加了中文 NER 提取的难度。因此,实体边界检测成为准确提取中文 EMR 实体的关键,我们需要开发一种支持多种长度实体识别的模型,而无需依赖任何医学词典。

方法

在本研究中,我们将词性(POS)信息纳入深度学习模型中,以提高中文实体边界检测的准确性。为了避免长实体的错误 POS 标记,我们提出了一种称为简化 POS 标记的方法,该方法保留了普通词的标记,但不保留看似医学实体的标记。本文提出的模型名为 SM-LSTM-CRF,由三层组成:自匹配注意力层 - 计算每个字符与整个句子的相关性;LSTM(长短期记忆)层 - 捕获每个字符的上下文特征;CRF(条件随机场)层 - 根据字符的特征和转移规则对字符进行标记。

结果

在中文 EMRs 数据集上的实验结果表明,与 LSTM-CRF 相比,SM-LSTM-CRF 的 F1 值提高了 2.59%。在模型中添加 POS 特征后,我们在 F1 上的提高约为 7.74%。简化 POS 标记减少了长实体的错误标记,从而使 F1 值提高了 2.42%,达到了 80.07%的 F1 得分。

结论

经简化 POS 标记标记的 POS 特征与自匹配注意力机制相结合,对实体边界施加了严格的限制,在临床实体识别方面表现良好。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/cecf/6454585/7c21e63cb157/12911_2019_762_Fig5_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/cecf/6454585/6ed67dcb0a0a/12911_2019_762_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/cecf/6454585/ae429509d782/12911_2019_762_Fig2_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/cecf/6454585/9f8702d31ba9/12911_2019_762_Fig3_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/cecf/6454585/eafcd064048f/12911_2019_762_Fig4_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/cecf/6454585/7c21e63cb157/12911_2019_762_Fig5_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/cecf/6454585/6ed67dcb0a0a/12911_2019_762_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/cecf/6454585/ae429509d782/12911_2019_762_Fig2_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/cecf/6454585/9f8702d31ba9/12911_2019_762_Fig3_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/cecf/6454585/eafcd064048f/12911_2019_762_Fig4_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/cecf/6454585/7c21e63cb157/12911_2019_762_Fig5_HTML.jpg

相似文献

1
A deep learning model incorporating part of speech and self-matching attention for named entity recognition of Chinese electronic medical records.基于词性和自匹配注意力的深度学习模型在中文电子病历命名实体识别中的应用。
BMC Med Inform Decis Mak. 2019 Apr 9;19(Suppl 2):65. doi: 10.1186/s12911-019-0762-7.
2
Extracting clinical named entity for pituitary adenomas from Chinese electronic medical records.从中文电子病历中提取垂体腺瘤的临床命名实体。
BMC Med Inform Decis Mak. 2022 Mar 23;22(1):72. doi: 10.1186/s12911-022-01810-z.
3
An attention-based deep learning model for clinical named entity recognition of Chinese electronic medical records.基于注意力的深度学习模型在中文电子病历临床命名实体识别中的应用。
BMC Med Inform Decis Mak. 2019 Dec 5;19(Suppl 5):235. doi: 10.1186/s12911-019-0933-6.
4
A hybrid approach for named entity recognition in Chinese electronic medical record.中文电子病历命名实体识别的混合方法。
BMC Med Inform Decis Mak. 2019 Apr 9;19(Suppl 2):64. doi: 10.1186/s12911-019-0767-2.
5
A fine-grained Chinese word segmentation and part-of-speech tagging corpus for clinical text.一个用于临床文本的细粒度中文分词和词性标注语料库。
BMC Med Inform Decis Mak. 2019 Apr 9;19(Suppl 2):66. doi: 10.1186/s12911-019-0770-7.
6
Improving the Named Entity Recognition of Chinese Electronic Medical Records by Combining Domain Dictionary and Rules.通过结合领域字典和规则来提高中文电子病历的命名实体识别。
Int J Environ Res Public Health. 2020 Apr 14;17(8):2687. doi: 10.3390/ijerph17082687.
7
Chinese clinical named entity recognition with radical-level feature and self-attention mechanism.基于词干级特征和自注意力机制的中文临床命名实体识别。
J Biomed Inform. 2019 Oct;98:103289. doi: 10.1016/j.jbi.2019.103289. Epub 2019 Sep 18.
8
Clinical Named Entity Recognition From Chinese Electronic Health Records via Machine Learning Methods.基于机器学习方法的中文电子健康记录临床命名实体识别
JMIR Med Inform. 2018 Dec 17;6(4):e50. doi: 10.2196/medinform.9965.
9
Entity recognition in Chinese clinical text using attention-based CNN-LSTM-CRF.基于注意力机制的卷积神经网络-长短时记忆网络-条件随机场在中文临床文本中的实体识别。
BMC Med Inform Decis Mak. 2019 Apr 4;19(Suppl 3):74. doi: 10.1186/s12911-019-0787-y.
10
Entity recognition from clinical texts via recurrent neural network.基于循环神经网络的临床文本实体识别。
BMC Med Inform Decis Mak. 2017 Jul 5;17(Suppl 2):67. doi: 10.1186/s12911-017-0468-7.

引用本文的文献

1
Semantic-enhanced graph neural network for named entity recognition in ancient Chinese books.基于语义增强图神经网络的古籍命名实体识别
Sci Rep. 2024 Jul 30;14(1):17488. doi: 10.1038/s41598-024-68561-x.
2
Application of Entity-BERT model based on neuroscience and brain-like cognition in electronic medical record entity recognition.基于神经科学和类脑认知的实体BERT模型在电子病历实体识别中的应用
Front Neurosci. 2023 Sep 20;17:1259652. doi: 10.3389/fnins.2023.1259652. eCollection 2023.
3
Named Entity Recognition of Diabetes Online Health Community Data Using Multiple Machine Learning Models.

本文引用的文献

1
Learning a Health Knowledge Graph from Electronic Medical Records.从电子病历中学习健康知识图谱。
Sci Rep. 2017 Jul 20;7(1):5994. doi: 10.1038/s41598-017-05778-z.
2
Named Entity Recognition in Chinese Clinical Text Using Deep Neural Network.基于深度神经网络的中文临床文本命名实体识别
Stud Health Technol Inform. 2015;216:624-8.
3
A comprehensive study of named entity recognition in Chinese clinical text.中文临床文本命名实体识别的综合研究。
使用多种机器学习模型对糖尿病在线健康社区数据进行命名实体识别
Bioengineering (Basel). 2023 May 29;10(6):659. doi: 10.3390/bioengineering10060659.
4
Named Entity Recognition of Medical Text Based on the Deep Neural Network.基于深度神经网络的医学文本命名实体识别
J Healthc Eng. 2022 Mar 7;2022:3990563. doi: 10.1155/2022/3990563. eCollection 2022.
5
A Deep Language Model for Symptom Extraction From Clinical Text and its Application to Extract COVID-19 Symptoms From Social Media.一种从临床文本中提取症状的深度语言模型及其在从社交媒体中提取 COVID-19 症状的应用。
IEEE J Biomed Health Inform. 2022 Apr;26(4):1737-1748. doi: 10.1109/JBHI.2021.3123192. Epub 2022 Apr 14.
6
Multi-Level Representation Learning for Chinese Medical Entity Recognition: Model Development and Validation.用于中文医学实体识别的多层次表示学习:模型开发与验证
JMIR Med Inform. 2020 May 4;8(5):e17637. doi: 10.2196/17637.
7
Constructing fine-grained entity recognition corpora based on clinical records of traditional Chinese medicine.基于中医临床记录构建细粒度实体识别语料库。
BMC Med Inform Decis Mak. 2020 Apr 6;20(1):64. doi: 10.1186/s12911-020-1079-2.
8
An attention-based deep learning model for clinical named entity recognition of Chinese electronic medical records.基于注意力的深度学习模型在中文电子病历临床命名实体识别中的应用。
BMC Med Inform Decis Mak. 2019 Dec 5;19(Suppl 5):235. doi: 10.1186/s12911-019-0933-6.
9
Deep learning in clinical natural language processing: a methodical review.深度学习在临床自然语言处理中的应用:系统综述。
J Am Med Inform Assoc. 2020 Mar 1;27(3):457-470. doi: 10.1093/jamia/ocz200.
J Am Med Inform Assoc. 2014 Sep-Oct;21(5):808-14. doi: 10.1136/amiajnl-2013-002381. Epub 2013 Dec 17.
4
Joint segmentation and named entity recognition using dual decomposition in Chinese discharge summaries.使用中文出院小结中的对偶分解进行联合分割和命名实体识别。
J Am Med Inform Assoc. 2014 Feb;21(e1):e84-92. doi: 10.1136/amiajnl-2013-001806. Epub 2013 Aug 9.