文献检索文档翻译深度研究
Suppr Zotero 插件Zotero 插件
邀请有礼套餐&价格历史记录

新学期,新优惠

限时优惠:9月1日-9月22日

30天高级会员仅需29元

1天体验卡首发特惠仅需5.99元

了解详情
不再提醒
插件&应用
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
高级版
套餐订阅购买积分包
AI 工具
文献检索文档翻译深度研究
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2025

基于机器学习方法的中文电子健康记录临床命名实体识别

Clinical Named Entity Recognition From Chinese Electronic Health Records via Machine Learning Methods.

作者信息

Zhang Yu, Wang Xuwen, Hou Zhen, Li Jiao

机构信息

Institute of Medical Information and Library, Chinese Academy of Medical Sciences, Peking Union Medical College, Beijing, China.

出版信息

JMIR Med Inform. 2018 Dec 17;6(4):e50. doi: 10.2196/medinform.9965.


DOI:10.2196/medinform.9965
PMID:30559093
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC6315256/
Abstract

BACKGROUND: Electronic health records (EHRs) are important data resources for clinical studies and applications. Physicians or clinicians describe patients' disorders or treatment procedures in EHRs using free text (unstructured) clinical notes. The narrative information plays an important role in patient treatment and clinical research. However, it is challenging to make machines understand the clinical narratives. OBJECTIVE: This study aimed to automatically identify Chinese clinical entities from free text in EHRs and make machines semantically understand diagnoses, tests, body parts, symptoms, treatments, and so on. METHODS: The dataset we used for this study is the benchmark dataset with human annotated Chinese EHRs, released by the China Conference on Knowledge Graph and Semantic Computing 2017 clinical named entity recognition challenge task. Overall, 2 machine learning models, the conditional random fields (CRF) method and bidirectional long short-term memory (LSTM)-CRF, were applied to recognize clinical entities from Chinese EHR data. To train the CRF-based model, we selected features such as bag of Chinese characters, part-of-speech tags, character types, and the position of characters. For the bidirectional LSTM-CRF-based model, character embeddings and segmentation information were used as features. In addition, we also employed a dictionary-based approach as the baseline for the purpose of performance evaluation. Precision, recall, and the harmonic average of precision and recall (F1 score) were used to evaluate the performance of the methods. RESULTS: Experiments on the test set showed that our methods were able to automatically identify types of Chinese clinical entities such as diagnosis, test, symptom, body part, and treatment simultaneously. With regard to overall performance, CRF and bidirectional LSTM-CRF achieved a precision of 0.9203 and 0.9112, recall of 0.8709 and 0.8974, and F1 score of 0.8949 and 0.9043, respectively. The results also indicated that our methods performed well in recognizing each type of clinical entity, in which the "symptom" type achieved the best F1 score of over 0.96. Moreover, as the number of features increased, the F1 score of the CRF model increased from 0.8547 to 0.8949. CONCLUSIONS: In this study, we employed two computational methods to simultaneously identify types of Chinese clinical entities from free text in EHRs. With training, these methods can effectively identify various types of clinical entities (eg, symptom and treatment) with high accuracy. The deep learning model, bidirectional LSTM-CRF, can achieve better performance than the CRF model with little feature engineering. This study contributed to translating human-readable health information into machine-readable information.

摘要

背景:电子健康记录(EHRs)是临床研究和应用的重要数据资源。医生或临床医生使用自由文本(非结构化)临床笔记在EHRs中描述患者的病症或治疗过程。叙述性信息在患者治疗和临床研究中起着重要作用。然而,让机器理解临床叙述具有挑战性。 目的:本研究旨在从EHRs中的自由文本中自动识别中文临床实体,并使机器在语义上理解诊断、检查、身体部位、症状、治疗等。 方法:我们用于本研究的数据集是由2017年中国知识图谱与语义计算会议临床命名实体识别挑战任务发布的带有人工标注中文EHRs的基准数据集。总体而言,应用了2种机器学习模型,即条件随机场(CRF)方法和双向长短期记忆(LSTM)-CRF,从中文EHR数据中识别临床实体。为了训练基于CRF的模型,我们选择了诸如汉字袋、词性标签、字符类型和字符位置等特征。对于基于双向LSTM-CRF的模型,字符嵌入和分词信息被用作特征。此外,我们还采用了基于字典的方法作为性能评估的基线。精确率、召回率以及精确率和召回率的调和平均值(F1分数)用于评估这些方法的性能。 结果:在测试集上的实验表明,我们的方法能够同时自动识别中文临床实体的类型,如诊断、检查、症状、身体部位和治疗。在整体性能方面,CRF和双向LSTM-CRF的精确率分别为0.9203和0.9112,召回率分别为0.8709和0.8974,F1分数分别为0.8949和0.9043。结果还表明,我们的方法在识别每种临床实体类型方面表现良好,其中“症状”类型的F1分数最高,超过0.96。此外,随着特征数量的增加,CRF模型的F1分数从0.8547提高到0.8949。 结论:在本研究中,我们采用了两种计算方法从EHRs中的自由文本中同时识别中文临床实体的类型。经过训练,这些方法能够有效地高精度识别各种类型的临床实体(如症状和治疗)。深度学习模型双向LSTM-CRF在几乎没有特征工程的情况下比CRF模型能取得更好的性能。本研究有助于将人类可读的健康信息转化为机器可读信息。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1707/6315256/1b1f3d152899/medinform_v6i4e50_fig3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1707/6315256/ec8dca5d1adb/medinform_v6i4e50_fig1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1707/6315256/2bdbebab4c7d/medinform_v6i4e50_fig2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1707/6315256/1b1f3d152899/medinform_v6i4e50_fig3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1707/6315256/ec8dca5d1adb/medinform_v6i4e50_fig1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1707/6315256/2bdbebab4c7d/medinform_v6i4e50_fig2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1707/6315256/1b1f3d152899/medinform_v6i4e50_fig3.jpg

相似文献

[1]
Clinical Named Entity Recognition From Chinese Electronic Health Records via Machine Learning Methods.

JMIR Med Inform. 2018-12-17

[2]
Extracting clinical named entity for pituitary adenomas from Chinese electronic medical records.

BMC Med Inform Decis Mak. 2022-3-23

[3]
A deep learning model incorporating part of speech and self-matching attention for named entity recognition of Chinese electronic medical records.

BMC Med Inform Decis Mak. 2019-4-9

[4]
Chinese Clinical Named Entity Recognition in Electronic Medical Records: Development of a Lattice Long Short-Term Memory Model With Contextualized Character Representations.

JMIR Med Inform. 2020-9-4

[5]
Chinese-Named Entity Recognition From Adverse Drug Event Records: Radical Embedding-Combined Dynamic Embedding-Based BERT in a Bidirectional Long Short-term Conditional Random Field (Bi-LSTM-CRF) Model.

JMIR Med Inform. 2021-12-1

[6]
Chinese Clinical Named Entity Recognition From Electronic Medical Records Based on Multisemantic Features by Using Robustly Optimized Bidirectional Encoder Representation From Transformers Pretraining Approach Whole Word Masking and Convolutional Neural Networks: Model Development and Validation.

JMIR Med Inform. 2023-5-10

[7]
Improving the Named Entity Recognition of Chinese Electronic Medical Records by Combining Domain Dictionary and Rules.

Int J Environ Res Public Health. 2020-4-14

[8]
De-identifying free text of Japanese electronic health records.

J Biomed Semantics. 2020-9-21

[9]
Medical Named Entity Extraction from Chinese Resident Admit Notes Using Character and Word Attention-Enhanced Neural Network.

Int J Environ Res Public Health. 2020-3-2

[10]
Named entity recognition from Chinese adverse drug event reports with lexical feature based BiLSTM-CRF and tri-training.

J Biomed Inform. 2019-7-16

引用本文的文献

[1]
Improving Clinical Documentation with Artificial Intelligence: A Systematic Review.

Perspect Health Inf Manag. 2024-6-1

[2]
A Multi-Task Causal Knowledge Fault Diagnosis Method for PMSM-ITSF Based on Meta-Learning.

Sensors (Basel). 2025-2-19

[3]
Evolution of the "Internet Plus Health Care" Mode Enabled by Artificial Intelligence: Development and Application of an Outpatient Triage System.

J Med Internet Res. 2024-10-30

[4]
Construction of a knowledge graph for breast cancer diagnosis based on Chinese electronic medical records: development and usability study.

BMC Med Inform Decis Mak. 2023-10-10

[5]
A weakly supervised method for named entity recognition of Chinese electronic medical records.

Med Biol Eng Comput. 2023-10

[6]
Advances in monolingual and crosslingual automatic disability annotation in Spanish.

BMC Bioinformatics. 2023-6-26

[7]
An Efficient Method for Deidentifying Protected Health Information in Chinese Electronic Health Records: Algorithm Development and Validation.

JMIR Med Inform. 2022-8-30

[8]
Identification and Impact Analysis of Family History of Psychiatric Disorder in Mood Disorder Patients With Pretrained Language Model.

Front Psychiatry. 2022-5-20

[9]
Multi-Label Classification in Patient-Doctor Dialogues With the RoBERTa-WWM-ext + CNN (Robustly Optimized Bidirectional Encoder Representations From Transformers Pretraining Approach With Whole Word Masking Extended Combining a Convolutional Neural Network) Model: Named Entity Study.

JMIR Med Inform. 2022-4-21

[10]
Extracting clinical named entity for pituitary adenomas from Chinese electronic medical records.

BMC Med Inform Decis Mak. 2022-3-23

本文引用的文献

[1]
Unsupervised Medical Entity Recognition and Linking in Chinese Online Medical Text.

J Healthc Eng. 2018-4-18

[2]
GRAM-CNN: a deep learning approach with local context for named entity recognition in biomedical text.

Bioinformatics. 2018-5-1

[3]
A Novel Approach towards Medical Entity Recognition in Chinese Clinical Text.

J Healthc Eng. 2017-7-5

[4]
Deep learning with word embeddings improves biomedical named entity recognition.

Bioinformatics. 2017-7-15

[5]
Entity recognition from clinical texts via recurrent neural network.

BMC Med Inform Decis Mak. 2017-7-5

[6]
Character-level neural network for biomedical named entity recognition.

J Biomed Inform. 2017-6

[7]
Structured prediction models for RNN based sequence labeling in clinical text.

Proc Conf Empir Methods Nat Lang Process. 2016-11

[8]
A Study of Neural Word Embeddings for Named Entity Recognition in Clinical Text.

AMIA Annu Symp Proc. 2015-11-5

[9]
Named Entity Recognition in Chinese Clinical Text Using Deep Neural Network.

Stud Health Technol Inform. 2015

[10]
Automatic recognition of disorders, findings, pharmaceuticals and body structures from clinical text: an annotation and machine learning study.

J Biomed Inform. 2014-6

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

推荐工具

医学文档翻译智能文献检索