• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

从中文电子病历中提取垂体腺瘤的临床命名实体。

Extracting clinical named entity for pituitary adenomas from Chinese electronic medical records.

机构信息

Life Science College, Central South University, No. 932 South Lushan Road, Changsha, 410083, China.

Institute of Medical Information, Chinese Academy of Medical Sciences, No. 3 Yabao Road, Beijing, 100020, China.

出版信息

BMC Med Inform Decis Mak. 2022 Mar 23;22(1):72. doi: 10.1186/s12911-022-01810-z.

DOI:10.1186/s12911-022-01810-z
PMID:35321705
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC8941801/
Abstract

OBJECTIVE

Pituitary adenomas are the most common type of pituitary disorders, which usually occur in young adults and often affect the patient's physical development, labor capacity and fertility. Clinical free texts noted in electronic medical records (EMRs) of pituitary adenomas patients contain abundant diagnosis and treatment information. However, this information has not been well utilized because of the challenge to extract information from unstructured clinical texts. This study aims to enable machines to intelligently process clinical information, and automatically extract clinical named entity for pituitary adenomas from Chinese EMRs.

METHODS

The clinical corpus used in this study was from one pituitary adenomas neurosurgery treatment center of a 3A hospital in China. Four types of fine-grained texts of clinical records were selected, which included notes from present illness, past medical history, case characteristics and family history of 500 pituitary adenoma inpatients. The dictionary-based matching, conditional random fields (CRF), bidirectional long short-term memory with CRF (BiLSTM-CRF), and bidirectional encoder representations from transformers with BiLSTM-CRF (BERT-BiLSTM-CRF) were used to extract clinical entities from a Chinese EMRs corpus. A comprehensive dictionary was constructed based on open source vocabularies and a domain dictionary for pituitary adenomas to conduct the dictionary-based matching method. We selected features such as part of speech, radical, document type, and the position of characters to train the CRF-based model. Random character embeddings and the character embeddings pretrained by BERT were used respectively as the input features for the BiLSTM-CRF model and the BERT-BiLSTM-CRF model. Both strict metric and relaxed metric were used to evaluate the performance of these methods.

RESULTS

Experimental results demonstrated that the deep learning and other machine learning methods were able to automatically extract clinical named entities, including symptoms, body regions, diseases, family histories, surgeries, medications, and disease courses of pituitary adenomas from Chinese EMRs. With regard to overall performance, BERT-BiLSTM-CRF has the highest strict F1 value of 91.27% and the highest relaxed F1 value of 95.57% respectively. Additional evaluations showed that BERT-BiLSTM-CRF performed best in almost all entity recognition except surgery and disease course. BiLSTM-CRF performed best in disease course entity recognition, and performed as well as the CRF model for part of speech, radical and document type features, with both strict and relaxed F1 value reaching 96.48%. The CRF model with part of speech, radical and document type features performed best in surgery entity recognition with relaxed F1 value of 95.29%.

CONCLUSIONS

In this study, we conducted four entity recognition methods for pituitary adenomas based on Chinese EMRs. It demonstrates that the deep learning methods can effectively extract various types of clinical entities with satisfying performance. This study contributed to the clinical named entity extraction from Chinese neurosurgical EMRs. The findings could also assist in information extraction in other Chinese medical texts.

摘要

目的

垂体腺瘤是最常见的垂体疾病类型,通常发生在年轻人中,常常影响患者的身体发育、劳动能力和生育能力。垂体腺瘤患者电子病历(EMR)中的临床自由文本包含丰富的诊断和治疗信息。然而,由于从非结构化临床文本中提取信息具有挑战性,因此这些信息尚未得到充分利用。本研究旨在使机器能够智能地处理临床信息,并自动从中文 EMR 中提取垂体腺瘤的临床命名实体。

方法

本研究使用的临床语料库来自中国一家 3A 医院的垂体腺瘤神经外科治疗中心。选择了四种精细的临床记录类型,包括 500 名垂体腺瘤住院患者的现病史、既往病史、病例特征和家族史记录。基于词典的匹配、条件随机场(CRF)、带有 CRF 的双向长短期记忆(BiLSTM-CRF)和带有 BiLSTM-CRF 的双向编码器表示(BERT-BiLSTM-CRF)被用于从中文 EMR 语料库中提取临床实体。基于开源词汇表和垂体腺瘤领域词典构建了一个综合词典,用于进行基于词典的匹配方法。我们选择了词性、部首、文档类型和字符位置等特征来训练 CRF 模型。分别使用随机字符嵌入和 BERT 预训练的字符嵌入作为 BiLSTM-CRF 模型和 BERT-BiLSTM-CRF 模型的输入特征。使用严格度量和宽松度量来评估这些方法的性能。

结果

实验结果表明,深度学习和其他机器学习方法能够自动从中文 EMR 中提取垂体腺瘤的临床命名实体,包括症状、身体部位、疾病、家族史、手术、药物和疾病病程。就整体性能而言,BERT-BiLSTM-CRF 在严格 F1 值和宽松 F1 值方面的表现均最高,分别为 91.27%和 95.57%。进一步的评估表明,BERT-BiLSTM-CRF 在几乎所有实体识别方面表现最佳,除了手术和疾病病程。BiLSTM-CRF 在疾病病程实体识别方面表现最佳,其词性、部首和文档类型特征的严格和宽松 F1 值均达到 96.48%。CRF 模型在手术实体识别方面表现最佳,其宽松 F1 值为 95.29%。

结论

在本研究中,我们针对中文 EMR 进行了四种垂体腺瘤的实体识别方法。这表明深度学习方法可以有效地提取各种类型的临床实体,具有令人满意的性能。本研究有助于从中文神经外科 EMR 中提取临床命名实体。研究结果还可以协助其他中文医疗文本的信息提取。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c722/8941801/4204161cf28e/12911_2022_1810_Fig8_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c722/8941801/3e31d137eee4/12911_2022_1810_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c722/8941801/8bd6416bc5af/12911_2022_1810_Fig2_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c722/8941801/acd3e741c415/12911_2022_1810_Fig3_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c722/8941801/a17e616f797b/12911_2022_1810_Fig4_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c722/8941801/452b618d963c/12911_2022_1810_Fig5_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c722/8941801/2f2e85319948/12911_2022_1810_Fig6_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c722/8941801/e2e701613a19/12911_2022_1810_Fig7_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c722/8941801/4204161cf28e/12911_2022_1810_Fig8_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c722/8941801/3e31d137eee4/12911_2022_1810_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c722/8941801/8bd6416bc5af/12911_2022_1810_Fig2_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c722/8941801/acd3e741c415/12911_2022_1810_Fig3_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c722/8941801/a17e616f797b/12911_2022_1810_Fig4_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c722/8941801/452b618d963c/12911_2022_1810_Fig5_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c722/8941801/2f2e85319948/12911_2022_1810_Fig6_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c722/8941801/e2e701613a19/12911_2022_1810_Fig7_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c722/8941801/4204161cf28e/12911_2022_1810_Fig8_HTML.jpg

相似文献

1
Extracting clinical named entity for pituitary adenomas from Chinese electronic medical records.从中文电子病历中提取垂体腺瘤的临床命名实体。
BMC Med Inform Decis Mak. 2022 Mar 23;22(1):72. doi: 10.1186/s12911-022-01810-z.
2
A deep learning model incorporating part of speech and self-matching attention for named entity recognition of Chinese electronic medical records.基于词性和自匹配注意力的深度学习模型在中文电子病历命名实体识别中的应用。
BMC Med Inform Decis Mak. 2019 Apr 9;19(Suppl 2):65. doi: 10.1186/s12911-019-0762-7.
3
Extracting comprehensive clinical information for breast cancer using deep learning methods.利用深度学习方法提取乳腺癌全面临床信息。
Int J Med Inform. 2019 Dec;132:103985. doi: 10.1016/j.ijmedinf.2019.103985. Epub 2019 Oct 2.
4
Clinical Named Entity Recognition From Chinese Electronic Health Records via Machine Learning Methods.基于机器学习方法的中文电子健康记录临床命名实体识别
JMIR Med Inform. 2018 Dec 17;6(4):e50. doi: 10.2196/medinform.9965.
5
Use of BERT (Bidirectional Encoder Representations from Transformers)-Based Deep Learning Method for Extracting Evidences in Chinese Radiology Reports: Development of a Computer-Aided Liver Cancer Diagnosis Framework.基于 BERT(来自 Transformers 的双向编码器表示)的深度学习方法在提取中文放射学报告证据中的应用:计算机辅助肝癌诊断框架的开发。
J Med Internet Res. 2021 Jan 12;23(1):e19689. doi: 10.2196/19689.
6
An attention-based deep learning model for clinical named entity recognition of Chinese electronic medical records.基于注意力的深度学习模型在中文电子病历临床命名实体识别中的应用。
BMC Med Inform Decis Mak. 2019 Dec 5;19(Suppl 5):235. doi: 10.1186/s12911-019-0933-6.
7
A hybrid approach for named entity recognition in Chinese electronic medical record.中文电子病历命名实体识别的混合方法。
BMC Med Inform Decis Mak. 2019 Apr 9;19(Suppl 2):64. doi: 10.1186/s12911-019-0767-2.
8
Chinese Clinical Named Entity Recognition From Electronic Medical Records Based on Multisemantic Features by Using Robustly Optimized Bidirectional Encoder Representation From Transformers Pretraining Approach Whole Word Masking and Convolutional Neural Networks: Model Development and Validation.基于多语义特征,利用经过稳健优化的基于变换器预训练方法的全词掩码和卷积神经网络从电子病历中进行中文临床命名实体识别:模型开发与验证
JMIR Med Inform. 2023 May 10;11:e44597. doi: 10.2196/44597.
9
Comparing Different Methods for Named Entity Recognition in Portuguese Neurology Text.比较葡萄牙语神经病学文本中命名实体识别的不同方法。
J Med Syst. 2020 Feb 28;44(4):77. doi: 10.1007/s10916-020-1542-8.
10
Named entity recognition from Chinese adverse drug event reports with lexical feature based BiLSTM-CRF and tri-training.基于词汇特征的 BiLSTM-CRF 和三训练的中药不良事件报告命名实体识别。
J Biomed Inform. 2019 Aug;96:103252. doi: 10.1016/j.jbi.2019.103252. Epub 2019 Jul 16.

引用本文的文献

1
Performance of Natural Language Processing for Information Extraction From Electronic Health Records Within Cancer: Systematic Review.自然语言处理在癌症电子健康记录信息提取中的性能:系统评价
JMIR Med Inform. 2025 Sep 12;13:e68707. doi: 10.2196/68707.
2
Deep learning in neurosurgery: a systematic literature review with a structured analysis of applications across subspecialties.神经外科中的深度学习:一项系统的文献综述,并对各亚专业的应用进行结构化分析。
Front Neurol. 2025 Apr 16;16:1532398. doi: 10.3389/fneur.2025.1532398. eCollection 2025.
3
Large Language Model Applications for Health Information Extraction in Oncology: Scoping Review.

本文引用的文献

1
Barriers to patient, provider, and caregiver adoption and use of electronic personal health records in chronic care: a systematic review.慢性病患者、医护人员和照护者采用和使用电子个人健康记录的障碍:系统综述。
BMC Med Inform Decis Mak. 2020 Jul 8;20(1):153. doi: 10.1186/s12911-020-01159-1.
2
Improving the Named Entity Recognition of Chinese Electronic Medical Records by Combining Domain Dictionary and Rules.通过结合领域字典和规则来提高中文电子病历的命名实体识别。
Int J Environ Res Public Health. 2020 Apr 14;17(8):2687. doi: 10.3390/ijerph17082687.
3
Systematic Evaluation of Research Progress on Natural Language Processing in Medicine Over the Past 20 Years: Bibliometric Study on PubMed.
用于肿瘤学健康信息提取的大语言模型应用:范围综述
JMIR Cancer. 2025 Mar 28;11:e65984. doi: 10.2196/65984.
4
Construction, evaluation, and application of an electronic medical record corpus for cerebral palsy rehabilitation.用于脑瘫康复的电子病历语料库的构建、评估及应用
Digit Health. 2024 Sep 27;10:20552076241286260. doi: 10.1177/20552076241286260. eCollection 2024 Jan-Dec.
5
Named Entity Recognition in Electronic Health Records: A Methodological Review.电子健康记录中的命名实体识别:方法学综述
Healthc Inform Res. 2023 Oct;29(4):286-300. doi: 10.4258/hir.2023.29.4.286. Epub 2023 Oct 31.
6
Named Entity Recognition of Diabetes Online Health Community Data Using Multiple Machine Learning Models.使用多种机器学习模型对糖尿病在线健康社区数据进行命名实体识别
Bioengineering (Basel). 2023 May 29;10(6):659. doi: 10.3390/bioengineering10060659.
7
Moving toward a standardized diagnostic statement of pituitary adenoma using an information extraction model: a real-world study based on electronic medical records.采用信息提取模型为垂体腺瘤制定标准化诊断陈述:基于电子病历的真实世界研究。
BMC Med Inform Decis Mak. 2022 Dec 7;22(1):319. doi: 10.1186/s12911-022-02031-0.
过去20年医学领域自然语言处理研究进展的系统评价:基于PubMed的文献计量学研究
J Med Internet Res. 2020 Jan 23;22(1):e16816. doi: 10.2196/16816.
4
Efficient Reuse of Natural Language Processing Models for Phenotype-Mention Identification in Free-text Electronic Medical Records: A Phenotype Embedding Approach.自然语言处理模型在自由文本电子病历中进行表型提及识别的高效复用:一种表型嵌入方法。
JMIR Med Inform. 2019 Dec 17;7(4):e14782. doi: 10.2196/14782.
5
Combining Contextualized Embeddings and Prior Knowledge for Clinical Named Entity Recognition: Evaluation Study.结合上下文嵌入和先验知识进行临床命名实体识别:评估研究
JMIR Med Inform. 2019 Nov 13;7(4):e14850. doi: 10.2196/14850.
6
Extracting comprehensive clinical information for breast cancer using deep learning methods.利用深度学习方法提取乳腺癌全面临床信息。
Int J Med Inform. 2019 Dec;132:103985. doi: 10.1016/j.ijmedinf.2019.103985. Epub 2019 Oct 2.
7
Automated detection of altered mental status in emergency department clinical notes: a deep learning approach.基于深度学习的急诊科临床记录中意识状态改变的自动检测。
BMC Med Inform Decis Mak. 2019 Aug 19;19(1):164. doi: 10.1186/s12911-019-0894-9.
8
Precursor-induced conditional random fields: connecting separate entities by induction for improved clinical named entity recognition.诱导前条件随机场:通过诱导连接独立实体以提高临床命名实体识别。
BMC Med Inform Decis Mak. 2019 Jul 15;19(1):132. doi: 10.1186/s12911-019-0865-1.
9
A Domain Knowledge-Enhanced LSTM-CRF Model for Disease Named Entity Recognition.一种用于疾病命名实体识别的领域知识增强型长短期记忆网络-条件随机场模型。
AMIA Jt Summits Transl Sci Proc. 2019 May 6;2019:761-770. eCollection 2019.
10
Natural Language Processing of Clinical Notes on Chronic Diseases: Systematic Review.慢性病临床记录的自然语言处理:系统综述
JMIR Med Inform. 2019 Apr 27;7(2):e12239. doi: 10.2196/12239.