BERT 的魔力是否适用于医疗编码分配？一项定量研究。

Does the magic of BERT apply to medical code assignment? A quantitative study.

机构信息

Department of Computer Science, Aalto University, Espoo, 00076, Finland.

出版信息

Comput Biol Med. 2021 Dec;139:104998. doi: 10.1016/j.compbiomed.2021.104998. Epub 2021 Oct 30.

DOI:10.1016/j.compbiomed.2021.104998

Abstract

Unsupervised pretraining is an integral part of many natural language processing systems, and transfer learning with language models has achieved remarkable results in downstream tasks. In the clinical application of medical code assignment, diagnosis and procedure codes are inferred from lengthy clinical notes such as hospital discharge summaries. However, it is not clear if pretrained models are useful for medical code prediction without further architecture engineering. This paper conducts a comprehensive quantitative analysis of various contextualized language models' performances, pretrained in different domains, for medical code assignment from clinical notes. We propose a hierarchical fine-tuning architecture to capture interactions between distant words and adopt label-wise attention to exploit label information. Contrary to current trends, we demonstrate that a carefully trained classical CNN outperforms attention-based models on a MIMIC-III subset with frequent codes. Our empirical findings suggest directions for building robust medical code assignment models.

摘要

无监督预训练是许多自然语言处理系统的一个组成部分，语言模型的迁移学习在下游任务中取得了显著的成果。在医疗编码分配的临床应用中，诊断和程序代码是从医院出院总结等冗长的临床记录中推断出来的。然而，在没有进一步进行架构工程的情况下，预训练模型对于医疗编码预测是否有用尚不清楚。本文对不同领域预训练的各种上下文语言模型在从临床记录中分配医疗代码方面的性能进行了全面的定量分析。我们提出了一种分层微调架构，以捕获远距离单词之间的交互，并采用标签分类注意力机制来利用标签信息。与当前的趋势相反，我们证明了在包含频繁代码的 MIMIC-III 子集中，经过精心训练的经典 CNN 优于基于注意力的模型。我们的实证研究结果为构建稳健的医疗编码分配模型提供了方向。

相似文献

Does the magic of BERT apply to medical code assignment? A quantitative study.

Comput Biol Med. 2021 Dec;139:104998. doi: 10.1016/j.compbiomed.2021.104998. Epub 2021 Oct 30.

Explainable automated coding of clinical notes using hierarchical label-wise attention networks and label embedding initialisation.

J Biomed Inform. 2021 Apr;116:103728. doi: 10.1016/j.jbi.2021.103728. Epub 2021 Mar 9.

Extracting comprehensive clinical information for breast cancer using deep learning methods.

Int J Med Inform. 2019 Dec;132:103985. doi: 10.1016/j.ijmedinf.2019.103985. Epub 2019 Oct 2.

Comparison of different feature extraction methods for applicable automated ICD coding.

BMC Med Inform Decis Mak. 2022 Jan 12;22(1):11. doi: 10.1186/s12911-022-01753-5.

When BERT meets Bilbo: a learning curve analysis of pretrained language model on disease classification.

BMC Med Inform Decis Mak. 2022 Apr 5;21(Suppl 9):377. doi: 10.1186/s12911-022-01829-2.

Disambiguating Clinical Abbreviations Using a One-Fits-All Classifier Based on Deep Learning Techniques.

Methods Inf Med. 2022 Jun;61(S 01):e28-e34. doi: 10.1055/s-0042-1742388. Epub 2022 Feb 1.

Med-BERT: pretrained contextualized embeddings on large-scale structured electronic health records for disease prediction.

NPJ Digit Med. 2021 May 20;4(1):86. doi: 10.1038/s41746-021-00455-y.

Unlocking the Secrets Behind Advanced Artificial Intelligence Language Models in Deidentifying Chinese-English Mixed Clinical Text: Development and Validation Study.

J Med Internet Res. 2024 Jan 25;26:e48443. doi: 10.2196/48443.

Symptom-BERT: Enhancing Cancer Symptom Detection in EHR Clinical Notes.

J Pain Symptom Manage. 2024 Aug;68(2):190-198.e1. doi: 10.1016/j.jpainsymman.2024.05.015. Epub 2024 May 23.

Relation Classification for Bleeding Events From Electronic Health Records Using Deep Learning Systems: An Empirical Study.

JMIR Med Inform. 2021 Jul 2;9(7):e27527. doi: 10.2196/27527.

引用本文的文献

Multi-Modal Fusion of Routine Care Electronic Health Records (EHR): A Scoping Review.

Information (Basel). 2025 Jan;16(1). doi: 10.3390/info16010054. Epub 2025 Jan 15.

Leveraging BERT for embedding ICD codes from large scale cardiovascular EMR data to understand patient diagnostic patterns.

BMC Med Inform Decis Mak. 2025 Aug 11;25(1):300. doi: 10.1186/s12911-025-03145-x.

BERT-DomainAFP: Antifreeze protein recognition and classification model based on BERT and structural domain annotation.

iScience. 2025 Mar 6;28(4):112077. doi: 10.1016/j.isci.2025.112077. eCollection 2025 Apr 18.

MISTIC: a novel approach for metastasis classification in Italian electronic health records using transformers.

BMC Med Inform Decis Mak. 2025 Apr 10;25(1):160. doi: 10.1186/s12911-025-02994-w.

Predicting ICU Readmission from Electronic Health Records via BERTopic with Long Short Term Memory Network Approach.

J Clin Med. 2024 Sep 18;13(18):5503. doi: 10.3390/jcm13185503.

Dementia risk prediction using decision-focused content selection from medical notes.

Comput Biol Med. 2024 Nov;182:109144. doi: 10.1016/j.compbiomed.2024.109144. Epub 2024 Sep 18.

Traditional Machine Learning, Deep Learning, and BERT (Large Language Model) Approaches for Predicting Hospitalizations From Nurse Triage Notes: Comparative Evaluation of Resource Management.

JMIR AI. 2024 Aug 27;3:e52190. doi: 10.2196/52190.

AnEMIC: A Framework for Benchmarking ICD Coding Models.

Proc Conf Empir Methods Nat Lang Process. 2022 Dec;2022(SD):109-120. doi: 10.18653/v1/2022.emnlp-demos.11.

Algorithmic Identification of Treatment-Emergent Adverse Events From Clinical Notes Using Large Language Models: A Pilot Study in Inflammatory Bowel Disease.

Clin Pharmacol Ther. 2024 Jun;115(6):1391-1399. doi: 10.1002/cpt.3226. Epub 2024 Mar 8.

A two-stream deep model for automated ICD-9 code prediction in an intensive care unit.

Heliyon. 2024 Feb 8;10(4):e25960. doi: 10.1016/j.heliyon.2024.e25960. eCollection 2024 Feb 29.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

BERT 的魔力是否适用于医疗编码分配？一项定量研究。

Does the magic of BERT apply to medical code assignment? A quantitative study.

机构信息

出版信息

相似文献

引用本文的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

相似文献

引用本文的文献