通过结合领域字典和规则来提高中文电子病历的命名实体识别。

Improving the Named Entity Recognition of Chinese Electronic Medical Records by Combining Domain Dictionary and Rules.

机构信息

School of Computer, University of South China, Hengyang 421001, China.

Center for Complex Networks and Systems Research, Luddy School of Informatics, Computing, and Engineering, Indiana University, Bloomington, IN 47408, USA.

出版信息

Int J Environ Res Public Health. 2020 Apr 14;17(8):2687. doi: 10.3390/ijerph17082687.

DOI:10.3390/ijerph17082687

PMID:32295174

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC7215438/

Abstract

Electronic medical records are an integral part of medical texts. Entity recognition of electronic medical records has triggered many studies that propose many entity extraction methods. In this paper, an entity extraction model is proposed to extract entities from Chinese Electronic Medical Records (CEMR). In the input layer of the model, we use word embedding and dictionary features embedding as input vectors, where word embedding consists of a character representation and a word representation. Then, the input vectors are fed to the bidirectional long short-term memory to capture contextual features. Finally, a conditional random field is employed to capture dependencies between neighboring tags. We performed experiments on body classification task, and the F1 values reached 90.65%. We also performed experiments on anatomic region recognition task, and the F1 values reached 93.89%. On both tasks, our model had higher performance than state-of-the-art models, such as Bi-LSTM-CRF, Bi-LSTM-Attention, and Vote. Through experiments, our model has a good effect when dealing with small frequency entities and unknown entities; with a small training dataset, our method showed 2-4% improvement on F1 value compared to the basic Bi-LSTM-CRF models. Additionally, on anatomic region recognition task, besides using our proposed entity extraction model, 12 rules we designed and domain dictionary were adopted. Then, in this task, the weighted F1 value of the three specific entities extraction reached 84.36%.

摘要

电子病历是医学文本的一个组成部分。电子病历中的实体识别已经引发了许多研究，提出了许多实体提取方法。在本文中，提出了一种从中文电子病历（CEMR）中提取实体的实体提取模型。在模型的输入层，我们使用词嵌入和字典特征嵌入作为输入向量，其中词嵌入包括字符表示和词表示。然后，将输入向量输入到双向长短期记忆中以捕获上下文特征。最后，使用条件随机场捕获相邻标签之间的依赖关系。我们在体分类任务上进行了实验，F1 值达到了 90.65%。我们还在解剖区域识别任务上进行了实验，F1 值达到了 93.89%。在这两个任务中，我们的模型的性能都优于 Bi-LSTM-CRF、Bi-LSTM-Attention 和 Vote 等最先进的模型。通过实验，我们的模型在处理小频率实体和未知实体时效果良好；在使用小训练数据集时，与基本的 Bi-LSTM-CRF 模型相比，我们的方法在 F1 值上提高了 2-4%。此外，在解剖区域识别任务中，除了使用我们提出的实体提取模型外，还采用了我们设计的 12 条规则和领域字典。然后，在这个任务中，三个特定实体提取的加权 F1 值达到了 84.36%。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5006/7215438/dd71224c7390/ijerph-17-02687-g001.jpg

相似文献

Improving the Named Entity Recognition of Chinese Electronic Medical Records by Combining Domain Dictionary and Rules.通过结合领域字典和规则来提高中文电子病历的命名实体识别。

Int J Environ Res Public Health. 2020 Apr 14;17(8):2687. doi: 10.3390/ijerph17082687.

A deep learning model incorporating part of speech and self-matching attention for named entity recognition of Chinese electronic medical records.基于词性和自匹配注意力的深度学习模型在中文电子病历命名实体识别中的应用。

BMC Med Inform Decis Mak. 2019 Apr 9;19(Suppl 2):65. doi: 10.1186/s12911-019-0762-7.

Extracting clinical named entity for pituitary adenomas from Chinese electronic medical records.从中文电子病历中提取垂体腺瘤的临床命名实体。

BMC Med Inform Decis Mak. 2022 Mar 23;22(1):72. doi: 10.1186/s12911-022-01810-z.

Medical Named Entity Extraction from Chinese Resident Admit Notes Using Character and Word Attention-Enhanced Neural Network.基于字符和词注意力增强神经网络的中文住院病案中医学命名实体抽取

Int J Environ Res Public Health. 2020 Mar 2;17(5):1614. doi: 10.3390/ijerph17051614.

Clinical Named Entity Recognition From Chinese Electronic Health Records via Machine Learning Methods.基于机器学习方法的中文电子健康记录临床命名实体识别

JMIR Med Inform. 2018 Dec 17;6(4):e50. doi: 10.2196/medinform.9965.

Entity recognition from clinical texts via recurrent neural network.基于循环神经网络的临床文本实体识别。

BMC Med Inform Decis Mak. 2017 Jul 5;17(Suppl 2):67. doi: 10.1186/s12911-017-0468-7.

Chinese Clinical Named Entity Recognition in Electronic Medical Records: Development of a Lattice Long Short-Term Memory Model With Contextualized Character Representations.电子病历中的中文临床命名实体识别：基于上下文特征表示的格长短期记忆模型的开发

JMIR Med Inform. 2020 Sep 4;8(9):e19848. doi: 10.2196/19848.

Chinese-Named Entity Recognition From Adverse Drug Event Records: Radical Embedding-Combined Dynamic Embedding-Based BERT in a Bidirectional Long Short-term Conditional Random Field (Bi-LSTM-CRF) Model.从药品不良事件记录中识别中文命名实体：基于激进嵌入与动态嵌入相结合的BERT的双向长短期条件随机场（Bi-LSTM-CRF）模型

JMIR Med Inform. 2021 Dec 1;9(12):e26407. doi: 10.2196/26407.

Adversarial training based lattice LSTM for Chinese clinical named entity recognition.基于对抗训练的格 lattice LSTM 进行中文临床命名实体识别。

J Biomed Inform. 2019 Nov;99:103290. doi: 10.1016/j.jbi.2019.103290. Epub 2019 Sep 23.

An attention-based deep learning model for clinical named entity recognition of Chinese electronic medical records.基于注意力的深度学习模型在中文电子病历临床命名实体识别中的应用。

BMC Med Inform Decis Mak. 2019 Dec 5;19(Suppl 5):235. doi: 10.1186/s12911-019-0933-6.

引用本文的文献

Biomedical named entity recognition using improved green anaconda-assisted Bi-GRU-based hierarchical ResNet model.使用改进的绿色蟒蛇辅助的基于双向门控循环单元的分层残差神经网络模型进行生物医学命名实体识别。

BMC Bioinformatics. 2025 Jan 30;26(1):34. doi: 10.1186/s12859-024-06008-w.

A Chinese named entity recognition model incorporating recurrent cell and information state recursion.一种结合循环单元和信息状态递归的中文命名实体识别模型。

Sci Rep. 2024 Mar 6;14(1):5564. doi: 10.1038/s41598-024-56166-3.

BIR: Biomedical Information Retrieval System for Cancer Treatment in Electronic Health Record Using Transformers.基于 Transformer 的电子健康记录中癌症治疗的生物医学信息检索系统

Sensors (Basel). 2023 Nov 23;23(23):9355. doi: 10.3390/s23239355.

A news-based climate policy uncertainty index for China.中国基于新闻的气候政策不确定性指数。

Sci Data. 2023 Dec 8;10(1):881. doi: 10.1038/s41597-023-02817-5.

Enhancing efficiency and capacity of telehealth services with intelligent triage: a bidirectional LSTM neural network model employing character embedding.利用智能分诊提高远程医疗服务的效率和容量：一种采用字符嵌入的双向 LSTM 神经网络模型。

BMC Med Inform Decis Mak. 2023 Nov 21;23(1):269. doi: 10.1186/s12911-023-02367-1.

Named Entity Recognition in Electronic Health Records: A Methodological Review.电子健康记录中的命名实体识别：方法学综述

Healthc Inform Res. 2023 Oct;29(4):286-300. doi: 10.4258/hir.2023.29.4.286. Epub 2023 Oct 31.

Sentiment Classification of Chinese Tourism Reviews Based on ERNIE-Gram+GCN.基于 ERNIE-Gram+GCN 的中文旅游评论情感分类。

Int J Environ Res Public Health. 2022 Oct 19;19(20):13520. doi: 10.3390/ijerph192013520.

Extracting clinical named entity for pituitary adenomas from Chinese electronic medical records.从中文电子病历中提取垂体腺瘤的临床命名实体。

BMC Med Inform Decis Mak. 2022 Mar 23;22(1):72. doi: 10.1186/s12911-022-01810-z.

本文引用的文献

SBLC: a hybrid model for disease named entity recognition based on semantic bidirectional LSTMs and conditional random fields.SBLC：一种基于语义双向 LSTM 和条件随机场的疾病命名实体识别混合模型。

BMC Med Inform Decis Mak. 2018 Dec 7;18(Suppl 5):114. doi: 10.1186/s12911-018-0690-y.

Deep learning with word embeddings improves biomedical named entity recognition.使用词嵌入的深度学习可改善生物医学命名实体识别。

Bioinformatics. 2017 Jul 15;33(14):i37-i48. doi: 10.1093/bioinformatics/btx228.

A case-based reasoning system based on weighted heterogeneous value distance metric for breast cancer diagnosis.一种基于加权异构值距离度量的乳腺癌诊断案例推理系统。

Artif Intell Med. 2017 Mar;77:31-47. doi: 10.1016/j.artmed.2017.02.003. Epub 2017 Feb 11.

Character-level neural network for biomedical named entity recognition.用于生物医学命名实体识别的字符级神经网络。

J Biomed Inform. 2017 Jun;70:85-91. doi: 10.1016/j.jbi.2017.05.002. Epub 2017 May 11.

Visualizing the knowledge structure and evolution of big data research in healthcare informatics.可视化医疗信息学中大数据研究的知识结构与演进。

Int J Med Inform. 2017 Feb;98:22-32. doi: 10.1016/j.ijmedinf.2016.11.006. Epub 2016 Nov 23.

Developing a hybrid dictionary-based bio-entity recognition technique.开发一种基于混合字典的生物实体识别技术。

BMC Med Inform Decis Mak. 2015;15 Suppl 1(Suppl 1):S9. doi: 10.1186/1472-6947-15-S1-S9. Epub 2015 May 20.

Electronic medical records (EMRs), epidemiology, and epistemology: reflections on EMRs and future pediatric clinical research.电子病历 (EMR)、流行病学和认识论：对 EMR 与未来儿科临床研究的反思。

Acad Pediatr. 2011 Jul-Aug;11(4):280-7. doi: 10.1016/j.acap.2011.02.007. Epub 2011 May 31.

EDGAR: extraction of drugs, genes and relations from the biomedical literature.EDGAR：从生物医学文献中提取药物、基因及关系。

Pac Symp Biocomput. 2000:517-28. doi: 10.1142/9789814447331_0049.

Long short-term memory.长短期记忆

Neural Comput. 1997 Nov 15;9(8):1735-80. doi: 10.1162/neco.1997.9.8.1735.

A general natural-language text processor for clinical radiology.一种用于临床放射学的通用自然语言文本处理器。

J Am Med Inform Assoc. 1994 Mar-Apr;1(2):161-74. doi: 10.1136/jamia.1994.95236146.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

通过结合领域字典和规则来提高中文电子病历的命名实体识别。

Improving the Named Entity Recognition of Chinese Electronic Medical Records by Combining Domain Dictionary and Rules.

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献