• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

用于中文医学实体识别的多层次表示学习:模型开发与验证

Multi-Level Representation Learning for Chinese Medical Entity Recognition: Model Development and Validation.

作者信息

Zhang Zhichang, Zhu Lin, Yu Peilin

机构信息

College of Computer Science and Engineering, University of Northwest Normal, Lanzhou, China.

出版信息

JMIR Med Inform. 2020 May 4;8(5):e17637. doi: 10.2196/17637.

DOI:10.2196/17637
PMID:32364514
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC7235813/
Abstract

BACKGROUND

Medical entity recognition is a key technology that supports the development of smart medicine. Existing methods on English medical entity recognition have undergone great development, but their progress in the Chinese language has been slow. Because of limitations due to the complexity of the Chinese language and annotated corpora, these methods are based on simple neural networks, which cannot effectively extract the deep semantic representations of electronic medical records (EMRs) and be used on the scarce medical corpora. We thus developed a new Chinese EMR (CEMR) dataset with six types of entities and proposed a multi-level representation learning model based on Bidirectional Encoder Representation from Transformers (BERT) for Chinese medical entity recognition.

OBJECTIVE

This study aimed to improve the performance of the language model by having it learn multi-level representation and recognize Chinese medical entities.

METHODS

In this paper, the pretraining language representation model was investigated; utilizing information not only from the final layer but from intermediate layers was found to affect the performance of the Chinese medical entity recognition task. Therefore, we proposed a multi-level representation learning model for entity recognition in Chinese EMRs. Specifically, we first used the BERT language model to extract semantic representations. Then, the multi-head attention mechanism was leveraged to automatically extract deeper semantic information from each layer. Finally, semantic representations from multi-level representation extraction were utilized as the final semantic context embedding for each token and we used softmax to predict the entity tags.

RESULTS

The best F1 score reached by the experiment was 82.11% when using the CEMR dataset, and the F1 score when using the CCKS (China Conference on Knowledge Graph and Semantic Computing) 2018 benchmark dataset further increased to 83.18%. Various comparative experiments showed that our proposed method outperforms methods from previous work and performs as a new state-of-the-art method.

CONCLUSIONS

The multi-level representation learning model is proposed as a method to perform the Chinese EMRs entity recognition task. Experiments on two clinical datasets demonstrate the usefulness of using the multi-head attention mechanism to extract multi-level representation as part of the language model.

摘要

背景

医学实体识别是支持智能医学发展的关键技术。现有的英文医学实体识别方法已经取得了很大进展,但在中文方面进展缓慢。由于中文语言和标注语料库的复杂性导致的局限性,这些方法基于简单神经网络,无法有效提取电子病历(EMR)的深度语义表示,也无法应用于稀缺的医学语料库。因此,我们开发了一个包含六种实体类型的新型中文电子病历(CEMR)数据集,并提出了一种基于双向编码器表征来自变换器(BERT)的多级表征学习模型用于中文医学实体识别。

目的

本研究旨在通过让语言模型学习多级表征来提高其性能并识别中文医学实体。

方法

本文对预训练语言表征模型进行了研究;发现不仅利用来自最后一层的信息,还利用中间层的信息会影响中文医学实体识别任务的性能。因此,我们提出了一种用于中文电子病历实体识别的多级表征学习模型。具体来说,我们首先使用BERT语言模型提取语义表征。然后,利用多头注意力机制从每一层自动提取更深层次的语义信息。最后,将来自多级表征提取的语义表征用作每个词元的最终语义上下文嵌入,并使用softmax预测实体标签。

结果

使用CEMR数据集时实验达到的最佳F1分数为82.11%,使用CCKS(中国知识图谱与语义计算大会)2018基准数据集时F1分数进一步提高到83.18%。各种对比实验表明,我们提出的方法优于先前工作中的方法,并作为一种新的最先进方法。

结论

提出了多级表征学习模型作为执行中文电子病历实体识别任务的一种方法。在两个临床数据集上的实验证明了使用多头注意力机制提取多级表征作为语言模型一部分的有用性。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/360b/7235813/529705858b91/medinform_v8i5e17637_fig5.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/360b/7235813/7dee42cfc68b/medinform_v8i5e17637_fig1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/360b/7235813/fd45dbb0a957/medinform_v8i5e17637_fig2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/360b/7235813/8262690010f3/medinform_v8i5e17637_fig3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/360b/7235813/e0ece48283ca/medinform_v8i5e17637_fig4.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/360b/7235813/529705858b91/medinform_v8i5e17637_fig5.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/360b/7235813/7dee42cfc68b/medinform_v8i5e17637_fig1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/360b/7235813/fd45dbb0a957/medinform_v8i5e17637_fig2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/360b/7235813/8262690010f3/medinform_v8i5e17637_fig3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/360b/7235813/e0ece48283ca/medinform_v8i5e17637_fig4.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/360b/7235813/529705858b91/medinform_v8i5e17637_fig5.jpg

相似文献

1
Multi-Level Representation Learning for Chinese Medical Entity Recognition: Model Development and Validation.用于中文医学实体识别的多层次表示学习:模型开发与验证
JMIR Med Inform. 2020 May 4;8(5):e17637. doi: 10.2196/17637.
2
Chinese Clinical Named Entity Recognition From Electronic Medical Records Based on Multisemantic Features by Using Robustly Optimized Bidirectional Encoder Representation From Transformers Pretraining Approach Whole Word Masking and Convolutional Neural Networks: Model Development and Validation.基于多语义特征,利用经过稳健优化的基于变换器预训练方法的全词掩码和卷积神经网络从电子病历中进行中文临床命名实体识别:模型开发与验证
JMIR Med Inform. 2023 May 10;11:e44597. doi: 10.2196/44597.
3
An attention-based deep learning model for clinical named entity recognition of Chinese electronic medical records.基于注意力的深度学习模型在中文电子病历临床命名实体识别中的应用。
BMC Med Inform Decis Mak. 2019 Dec 5;19(Suppl 5):235. doi: 10.1186/s12911-019-0933-6.
4
Extracting clinical named entity for pituitary adenomas from Chinese electronic medical records.从中文电子病历中提取垂体腺瘤的临床命名实体。
BMC Med Inform Decis Mak. 2022 Mar 23;22(1):72. doi: 10.1186/s12911-022-01810-z.
5
Chinese Clinical Named Entity Recognition in Electronic Medical Records: Development of a Lattice Long Short-Term Memory Model With Contextualized Character Representations.电子病历中的中文临床命名实体识别:基于上下文特征表示的格长短期记忆模型的开发
JMIR Med Inform. 2020 Sep 4;8(9):e19848. doi: 10.2196/19848.
6
A deep learning model incorporating part of speech and self-matching attention for named entity recognition of Chinese electronic medical records.基于词性和自匹配注意力的深度学习模型在中文电子病历命名实体识别中的应用。
BMC Med Inform Decis Mak. 2019 Apr 9;19(Suppl 2):65. doi: 10.1186/s12911-019-0762-7.
7
A Fine-Tuned Bidirectional Encoder Representations From Transformers Model for Food Named-Entity Recognition: Algorithm Development and Validation.基于 Transformer 的双向编码器表示模型的精细调整在食品命名实体识别中的应用:算法开发与验证。
J Med Internet Res. 2021 Aug 9;23(8):e28229. doi: 10.2196/28229.
8
Named entity recognition of Chinese electronic medical records based on a hybrid neural network and medical MC-BERT.基于混合神经网络和医学 MC-BERT 的中文电子病历命名实体识别。
BMC Med Inform Decis Mak. 2022 Dec 1;22(1):315. doi: 10.1186/s12911-022-02059-2.
9
Application of Entity-BERT model based on neuroscience and brain-like cognition in electronic medical record entity recognition.基于神经科学和类脑认知的实体BERT模型在电子病历实体识别中的应用
Front Neurosci. 2023 Sep 20;17:1259652. doi: 10.3389/fnins.2023.1259652. eCollection 2023.
10
Korean clinical entity recognition from diagnosis text using BERT.基于 BERT 的韩语文本临床实体识别。
BMC Med Inform Decis Mak. 2020 Sep 30;20(Suppl 7):242. doi: 10.1186/s12911-020-01241-8.

引用本文的文献

1
An Entity Extraction Pipeline for Medical Text Records Using Large Language Models: Analytical Study.基于大型语言模型的医疗文本记录实体抽取流水线:分析研究。
J Med Internet Res. 2024 Mar 29;26:e54580. doi: 10.2196/54580.
2
Health Natural Language Processing: Methodology Development and Applications.健康自然语言处理:方法学发展与应用
JMIR Med Inform. 2021 Oct 21;9(10):e23898. doi: 10.2196/23898.
3
Year 2020 (with COVID): Observation of Scientific Literature on Clinical Natural Language Processing.2020 年(含新冠疫情):临床自然语言处理相关科学文献观察

本文引用的文献

1
Combining Contextualized Embeddings and Prior Knowledge for Clinical Named Entity Recognition: Evaluation Study.结合上下文嵌入和先验知识进行临床命名实体识别:评估研究
JMIR Med Inform. 2019 Nov 13;7(4):e14850. doi: 10.2196/14850.
2
A deep learning model incorporating part of speech and self-matching attention for named entity recognition of Chinese electronic medical records.基于词性和自匹配注意力的深度学习模型在中文电子病历命名实体识别中的应用。
BMC Med Inform Decis Mak. 2019 Apr 9;19(Suppl 2):65. doi: 10.1186/s12911-019-0762-7.
3
Named Entity Recognition in Chinese Clinical Text Using Deep Neural Network.
Yearb Med Inform. 2021 Aug;30(1):257-263. doi: 10.1055/s-0041-1726528. Epub 2021 Sep 3.
基于深度神经网络的中文临床文本命名实体识别
Stud Health Technol Inform. 2015;216:624-8.
4
Annotating risk factors for heart disease in clinical narratives for diabetic patients.在糖尿病患者的临床记录中注释心脏病的危险因素。
J Biomed Inform. 2015 Dec;58 Suppl(Suppl):S78-S91. doi: 10.1016/j.jbi.2015.05.009. Epub 2015 May 21.