基于机器学习方法的中文电子健康记录临床命名实体识别

Clinical Named Entity Recognition From Chinese Electronic Health Records via Machine Learning Methods.

作者信息

Zhang Yu, Wang Xuwen, Hou Zhen, Li Jiao

机构信息

Institute of Medical Information and Library, Chinese Academy of Medical Sciences, Peking Union Medical College, Beijing, China.

出版信息

JMIR Med Inform. 2018 Dec 17;6(4):e50. doi: 10.2196/medinform.9965.

DOI:10.2196/medinform.9965

PMID:30559093

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC6315256/

Abstract

BACKGROUND

Electronic health records (EHRs) are important data resources for clinical studies and applications. Physicians or clinicians describe patients' disorders or treatment procedures in EHRs using free text (unstructured) clinical notes. The narrative information plays an important role in patient treatment and clinical research. However, it is challenging to make machines understand the clinical narratives.

OBJECTIVE

This study aimed to automatically identify Chinese clinical entities from free text in EHRs and make machines semantically understand diagnoses, tests, body parts, symptoms, treatments, and so on.

METHODS

The dataset we used for this study is the benchmark dataset with human annotated Chinese EHRs, released by the China Conference on Knowledge Graph and Semantic Computing 2017 clinical named entity recognition challenge task. Overall, 2 machine learning models, the conditional random fields (CRF) method and bidirectional long short-term memory (LSTM)-CRF, were applied to recognize clinical entities from Chinese EHR data. To train the CRF-based model, we selected features such as bag of Chinese characters, part-of-speech tags, character types, and the position of characters. For the bidirectional LSTM-CRF-based model, character embeddings and segmentation information were used as features. In addition, we also employed a dictionary-based approach as the baseline for the purpose of performance evaluation. Precision, recall, and the harmonic average of precision and recall (F1 score) were used to evaluate the performance of the methods.

RESULTS

Experiments on the test set showed that our methods were able to automatically identify types of Chinese clinical entities such as diagnosis, test, symptom, body part, and treatment simultaneously. With regard to overall performance, CRF and bidirectional LSTM-CRF achieved a precision of 0.9203 and 0.9112, recall of 0.8709 and 0.8974, and F1 score of 0.8949 and 0.9043, respectively. The results also indicated that our methods performed well in recognizing each type of clinical entity, in which the "symptom" type achieved the best F1 score of over 0.96. Moreover, as the number of features increased, the F1 score of the CRF model increased from 0.8547 to 0.8949.

CONCLUSIONS

In this study, we employed two computational methods to simultaneously identify types of Chinese clinical entities from free text in EHRs. With training, these methods can effectively identify various types of clinical entities (eg, symptom and treatment) with high accuracy. The deep learning model, bidirectional LSTM-CRF, can achieve better performance than the CRF model with little feature engineering. This study contributed to translating human-readable health information into machine-readable information.

摘要

背景

电子健康记录（EHRs）是临床研究和应用的重要数据资源。医生或临床医生使用自由文本（非结构化）临床笔记在EHRs中描述患者的病症或治疗过程。叙述性信息在患者治疗和临床研究中起着重要作用。然而，让机器理解临床叙述具有挑战性。

目的

本研究旨在从EHRs中的自由文本中自动识别中文临床实体，并使机器在语义上理解诊断、检查、身体部位、症状、治疗等。

方法

我们用于本研究的数据集是由2017年中国知识图谱与语义计算会议临床命名实体识别挑战任务发布的带有人工标注中文EHRs的基准数据集。总体而言，应用了2种机器学习模型，即条件随机场（CRF）方法和双向长短期记忆（LSTM）-CRF，从中文EHR数据中识别临床实体。为了训练基于CRF的模型，我们选择了诸如汉字袋、词性标签、字符类型和字符位置等特征。对于基于双向LSTM-CRF的模型，字符嵌入和分词信息被用作特征。此外，我们还采用了基于字典的方法作为性能评估的基线。精确率、召回率以及精确率和召回率的调和平均值（F1分数）用于评估这些方法的性能。

结果

在测试集上的实验表明，我们的方法能够同时自动识别中文临床实体的类型，如诊断、检查、症状、身体部位和治疗。在整体性能方面，CRF和双向LSTM-CRF的精确率分别为0.9203和0.9112，召回率分别为0.8709和0.8974，F1分数分别为0.8949和0.9043。结果还表明，我们的方法在识别每种临床实体类型方面表现良好，其中“症状”类型的F1分数最高，超过0.96。此外，随着特征数量的增加，CRF模型的F1分数从0.8547提高到0.8949。

结论

在本研究中，我们采用了两种计算方法从EHRs中的自由文本中同时识别中文临床实体的类型。经过训练，这些方法能够有效地高精度识别各种类型的临床实体（如症状和治疗）。深度学习模型双向LSTM-CRF在几乎没有特征工程的情况下比CRF模型能取得更好的性能。本研究有助于将人类可读的健康信息转化为机器可读信息。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1707/6315256/ec8dca5d1adb/medinform_v6i4e50_fig1.jpg

相似文献

Clinical Named Entity Recognition From Chinese Electronic Health Records via Machine Learning Methods.基于机器学习方法的中文电子健康记录临床命名实体识别

JMIR Med Inform. 2018 Dec 17;6(4):e50. doi: 10.2196/medinform.9965.

Extracting clinical named entity for pituitary adenomas from Chinese electronic medical records.从中文电子病历中提取垂体腺瘤的临床命名实体。

BMC Med Inform Decis Mak. 2022 Mar 23;22(1):72. doi: 10.1186/s12911-022-01810-z.

A deep learning model incorporating part of speech and self-matching attention for named entity recognition of Chinese electronic medical records.基于词性和自匹配注意力的深度学习模型在中文电子病历命名实体识别中的应用。

BMC Med Inform Decis Mak. 2019 Apr 9;19(Suppl 2):65. doi: 10.1186/s12911-019-0762-7.

Chinese Clinical Named Entity Recognition in Electronic Medical Records: Development of a Lattice Long Short-Term Memory Model With Contextualized Character Representations.电子病历中的中文临床命名实体识别：基于上下文特征表示的格长短期记忆模型的开发

JMIR Med Inform. 2020 Sep 4;8(9):e19848. doi: 10.2196/19848.

Chinese-Named Entity Recognition From Adverse Drug Event Records: Radical Embedding-Combined Dynamic Embedding-Based BERT in a Bidirectional Long Short-term Conditional Random Field (Bi-LSTM-CRF) Model.从药品不良事件记录中识别中文命名实体：基于激进嵌入与动态嵌入相结合的BERT的双向长短期条件随机场（Bi-LSTM-CRF）模型

JMIR Med Inform. 2021 Dec 1;9(12):e26407. doi: 10.2196/26407.

Chinese Clinical Named Entity Recognition From Electronic Medical Records Based on Multisemantic Features by Using Robustly Optimized Bidirectional Encoder Representation From Transformers Pretraining Approach Whole Word Masking and Convolutional Neural Networks: Model Development and Validation.基于多语义特征，利用经过稳健优化的基于变换器预训练方法的全词掩码和卷积神经网络从电子病历中进行中文临床命名实体识别：模型开发与验证

JMIR Med Inform. 2023 May 10;11:e44597. doi: 10.2196/44597.

Improving the Named Entity Recognition of Chinese Electronic Medical Records by Combining Domain Dictionary and Rules.通过结合领域字典和规则来提高中文电子病历的命名实体识别。

Int J Environ Res Public Health. 2020 Apr 14;17(8):2687. doi: 10.3390/ijerph17082687.

De-identifying free text of Japanese electronic health records.去标识化日本电子健康记录的自由文本。

J Biomed Semantics. 2020 Sep 21;11(1):11. doi: 10.1186/s13326-020-00227-9.

Medical Named Entity Extraction from Chinese Resident Admit Notes Using Character and Word Attention-Enhanced Neural Network.基于字符和词注意力增强神经网络的中文住院病案中医学命名实体抽取

Int J Environ Res Public Health. 2020 Mar 2;17(5):1614. doi: 10.3390/ijerph17051614.

Named entity recognition from Chinese adverse drug event reports with lexical feature based BiLSTM-CRF and tri-training.基于词汇特征的 BiLSTM-CRF 和三训练的中药不良事件报告命名实体识别。

J Biomed Inform. 2019 Aug;96:103252. doi: 10.1016/j.jbi.2019.103252. Epub 2019 Jul 16.

引用本文的文献

Improving Clinical Documentation with Artificial Intelligence: A Systematic Review.利用人工智能改善临床文档记录：一项系统综述。

Perspect Health Inf Manag. 2024 Jun 1;21(2):1d. eCollection 2024 Summer-Fall.

A Multi-Task Causal Knowledge Fault Diagnosis Method for PMSM-ITSF Based on Meta-Learning.一种基于元学习的永磁同步电机集成温度监测系统多任务因果知识故障诊断方法

Sensors (Basel). 2025 Feb 19;25(4):1271. doi: 10.3390/s25041271.

Evolution of the "Internet Plus Health Care" Mode Enabled by Artificial Intelligence: Development and Application of an Outpatient Triage System.人工智能助力的“互联网+医疗健康”模式演进：门诊分诊系统的开发与应用。

J Med Internet Res. 2024 Oct 30;26:e51711. doi: 10.2196/51711.

Construction of a knowledge graph for breast cancer diagnosis based on Chinese electronic medical records: development and usability study.基于中文电子病历构建乳腺癌诊断知识图谱：开发与可用性研究。

BMC Med Inform Decis Mak. 2023 Oct 10;23(1):210. doi: 10.1186/s12911-023-02322-0.

A weakly supervised method for named entity recognition of Chinese electronic medical records.一种用于中文电子病历命名实体识别的弱监督方法。

Med Biol Eng Comput. 2023 Oct;61(10):2733-2743. doi: 10.1007/s11517-023-02871-6. Epub 2023 Jul 15.

Advances in monolingual and crosslingual automatic disability annotation in Spanish.西班牙语中单语和跨语言自动残疾标注的进展。

BMC Bioinformatics. 2023 Jun 26;24(1):265. doi: 10.1186/s12859-023-05372-3.

An Efficient Method for Deidentifying Protected Health Information in Chinese Electronic Health Records: Algorithm Development and Validation.一种在中国电子健康记录中去识别受保护健康信息的有效方法：算法开发与验证

JMIR Med Inform. 2022 Aug 30;10(8):e38154. doi: 10.2196/38154.

Identification and Impact Analysis of Family History of Psychiatric Disorder in Mood Disorder Patients With Pretrained Language Model.使用预训练语言模型对心境障碍患者精神疾病家族史的识别与影响分析

Front Psychiatry. 2022 May 20;13:861930. doi: 10.3389/fpsyt.2022.861930. eCollection 2022.

Multi-Label Classification in Patient-Doctor Dialogues With the RoBERTa-WWM-ext + CNN (Robustly Optimized Bidirectional Encoder Representations From Transformers Pretraining Approach With Whole Word Masking Extended Combining a Convolutional Neural Network) Model: Named Entity Study.基于RoBERTa-WWM-ext + CNN（带有全词掩码扩展的基于变换器预训练方法的稳健优化双向编码器表示与卷积神经网络相结合）模型的医患对话多标签分类：命名实体研究

JMIR Med Inform. 2022 Apr 21;10(4):e35606. doi: 10.2196/35606.

Extracting clinical named entity for pituitary adenomas from Chinese electronic medical records.从中文电子病历中提取垂体腺瘤的临床命名实体。

BMC Med Inform Decis Mak. 2022 Mar 23;22(1):72. doi: 10.1186/s12911-022-01810-z.

本文引用的文献

Unsupervised Medical Entity Recognition and Linking in Chinese Online Medical Text.中文在线医疗文本中的无监督医学实体识别与链接

J Healthc Eng. 2018 Apr 18;2018:2548537. doi: 10.1155/2018/2548537. eCollection 2018.

GRAM-CNN: a deep learning approach with local context for named entity recognition in biomedical text.GRAM-CNN：一种基于局部上下文的深度学习方法，用于生物医学文本中的命名实体识别。

Bioinformatics. 2018 May 1;34(9):1547-1554. doi: 10.1093/bioinformatics/btx815.

A Novel Approach towards Medical Entity Recognition in Chinese Clinical Text.中文临床文本中医疗实体识别的新方法。

J Healthc Eng. 2017;2017:4898963. doi: 10.1155/2017/4898963. Epub 2017 Jul 5.

Deep learning with word embeddings improves biomedical named entity recognition.使用词嵌入的深度学习可改善生物医学命名实体识别。

Bioinformatics. 2017 Jul 15;33(14):i37-i48. doi: 10.1093/bioinformatics/btx228.

Entity recognition from clinical texts via recurrent neural network.基于循环神经网络的临床文本实体识别。

BMC Med Inform Decis Mak. 2017 Jul 5;17(Suppl 2):67. doi: 10.1186/s12911-017-0468-7.

Character-level neural network for biomedical named entity recognition.用于生物医学命名实体识别的字符级神经网络。

J Biomed Inform. 2017 Jun;70:85-91. doi: 10.1016/j.jbi.2017.05.002. Epub 2017 May 11.

Structured prediction models for RNN based sequence labeling in clinical text.用于临床文本中基于循环神经网络的序列标注的结构化预测模型。

Proc Conf Empir Methods Nat Lang Process. 2016 Nov;2016:856-865. doi: 10.18653/v1/d16-1082.

A Study of Neural Word Embeddings for Named Entity Recognition in Clinical Text.用于临床文本中命名实体识别的神经词嵌入研究

AMIA Annu Symp Proc. 2015 Nov 5;2015:1326-33. eCollection 2015.

Named Entity Recognition in Chinese Clinical Text Using Deep Neural Network.基于深度神经网络的中文临床文本命名实体识别

Stud Health Technol Inform. 2015;216:624-8.

Automatic recognition of disorders, findings, pharmaceuticals and body structures from clinical text: an annotation and machine learning study.从临床文本中自动识别疾病、检查结果、药物和身体结构：一项注释与机器学习研究。

J Biomed Inform. 2014 Jun;49:148-58. doi: 10.1016/j.jbi.2014.01.012. Epub 2014 Feb 4.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

基于机器学习方法的中文电子健康记录临床命名实体识别

Clinical Named Entity Recognition From Chinese Electronic Health Records via Machine Learning Methods.

作者信息

机构信息

出版信息

BACKGROUND

OBJECTIVE

METHODS

RESULTS

CONCLUSIONS

背景

目的

方法

结果

结论

相似文献

引用本文的文献

本文引用的文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献