使用分层标签分类注意力网络和标签嵌入初始化来实现临床笔记的可解释自动化编码。

Explainable automated coding of clinical notes using hierarchical label-wise attention networks and label embedding initialisation.

机构信息

Centre for Medical Informatics, Usher Institute of Population Health Sciences and Informatics, University of Edinburgh, Edinburgh, United Kingdom; Health Data Research UK, London, United Kingdom.

出版信息

J Biomed Inform. 2021 Apr;116:103728. doi: 10.1016/j.jbi.2021.103728. Epub 2021 Mar 9.

DOI:10.1016/j.jbi.2021.103728

PMID:33711543

Abstract

BACKGROUND

Diagnostic or procedural coding of clinical notes aims to derive a coded summary of disease-related information about patients. Such coding is usually done manually in hospitals but could potentially be automated to improve the efficiency and accuracy of medical coding. Recent studies on deep learning for automated medical coding achieved promising performances. However, the explainability of these models is usually poor, preventing them to be used confidently in supporting clinical practice. Another limitation is that these models mostly assume independence among labels, ignoring the complex correlations among medical codes which can potentially be exploited to improve the performance.

METHODS

To address the issues of model explainability and label correlations, we propose a Hierarchical Label-wise Attention Network (HLAN), which aimed to interpret the model by quantifying importance (as attention weights) of words and sentences related to each of the labels. Secondly, we propose to enhance the major deep learning models with a label embedding (LE) initialisation approach, which learns a dense, continuous vector representation and then injects the representation into the final layers and the label-wise attention layers in the models. We evaluated the methods using three settings on the MIMIC-III discharge summaries: full codes, top-50 codes, and the UK NHS (National Health Service) COVID-19 (Coronavirus disease 2019) shielding codes. Experiments were conducted to compare the HLAN model and label embedding initialisation to the state-of-the-art neural network based methods, including variants of Convolutional Neural Networks (CNNs) and Recurrent Neural Networks (RNNs).

RESULTS

HLAN achieved the best Micro-level AUC and F on the top-50 code prediction, 91.9% and 64.1%, respectively; and comparable results on the NHS COVID-19 shielding code prediction to other models: around 97% Micro-level AUC. More importantly, in the analysis of model explanations, by highlighting the most salient words and sentences for each label, HLAN showed more meaningful and comprehensive model interpretation compared to the CNN-based models and its downgraded baselines, HAN and HA-GRU. Label embedding (LE) initialisation significantly boosted the previous state-of-the-art model, CNN with attention mechanisms, on the full code prediction to 52.5% Micro-level F. The analysis of the layers initialised with label embeddings further explains the effect of this initialisation approach. The source code of the implementation and the results are openly available at https://github.com/acadTags/Explainable-Automated-Medical-Coding.

CONCLUSION

We draw the conclusion from the evaluation results and analyses. First, with hierarchical label-wise attention mechanisms, HLAN can provide better or comparable results for automated coding to the state-of-the-art, CNN-based models. Second, HLAN can provide more comprehensive explanations for each label by highlighting key words and sentences in the discharge summaries, compared to the n-grams in the CNN-based models and the downgraded baselines, HAN and HA-GRU. Third, the performance of deep learning based multi-label classification for automated coding can be consistently boosted by initialising label embeddings that captures the correlations among labels. We further discuss the advantages and drawbacks of the overall method regarding its potential to be deployed to a hospital and suggest areas for future studies.

摘要

背景

临床记录的诊断或程序编码旨在得出有关患者疾病相关信息的编码摘要。这种编码通常在医院中手动完成，但可以通过自动化来提高医疗编码的效率和准确性。最近在深度学习方面进行的自动医疗编码研究取得了有希望的成果。然而，这些模型的可解释性通常较差，阻止它们在支持临床实践中被自信地使用。另一个限制是这些模型主要假设标签之间的独立性，忽略了医疗代码之间的复杂相关性，这些相关性可以被利用来提高性能。

方法

为了解决模型可解释性和标签相关性的问题，我们提出了一种层次标签注意力网络（HLAN），旨在通过量化与每个标签相关的单词和句子的重要性（作为注意力权重）来解释模型。其次，我们提出通过标签嵌入（LE）初始化方法增强主要的深度学习模型，该方法学习密集、连续的向量表示，然后将表示注入模型的最后几层和标签注意力层。我们使用 MIMIC-III 出院记录中的三种设置来评估方法：全代码、前 50 个代码和英国 NHS（国家卫生服务）COVID-19（2019 年冠状病毒病）屏蔽代码。进行了实验以比较 HLAN 模型和标签嵌入初始化与最先进的基于神经网络的方法，包括卷积神经网络（CNNs）和递归神经网络（RNNs）的变体。

结果

HLAN 在预测前 50 个代码方面取得了最佳的 Micro-level AUC 和 F，分别为 91.9%和 64.1%；在预测英国 NHS COVID-19 屏蔽代码方面与其他模型相当：大约 97%的 Micro-level AUC。更重要的是，在模型解释的分析中，HLAN 通过突出每个标签的最显著的单词和句子，与基于 CNN 的模型及其降级基线 HAN 和 HA-GRU 相比，提供了更有意义和全面的模型解释。标签嵌入（LE）初始化显著提高了以前基于注意力机制的 CNN 对全代码预测的性能，达到 52.5%的 Micro-level F。对用标签嵌入初始化的层的分析进一步解释了这种初始化方法的效果。实现的源代码和结果可在 https://github.com/acadTags/Explainable-Automated-Medical-Coding 上公开获取。

结论

我们从评估结果和分析中得出结论。首先，通过层次标签注意力机制，HLAN 可以为自动化编码提供与最先进的基于 CNN 的模型相当或更好的结果。其次，HLAN 可以通过突出出院记录中的关键词和句子来提供每个标签的更全面的解释，与基于 CNN 的模型和降级基线 HAN 和 HA-GRU 中的 n-grams 相比。第三，通过初始化捕获标签之间相关性的标签嵌入，可以一致地提高基于深度学习的多标签分类的性能，用于自动化编码。我们进一步讨论了该方法在潜在部署到医院方面的优缺点，并提出了未来研究的方向。

相似文献

Explainable automated coding of clinical notes using hierarchical label-wise attention networks and label embedding initialisation.使用分层标签分类注意力网络和标签嵌入初始化来实现临床笔记的可解释自动化编码。

J Biomed Inform. 2021 Apr;116:103728. doi: 10.1016/j.jbi.2021.103728. Epub 2021 Mar 9.

Hierarchical label-wise attention transformer model for explainable ICD coding.基于分层标签注意力转换器模型的可解释 ICD 编码。

J Biomed Inform. 2022 Sep;133:104161. doi: 10.1016/j.jbi.2022.104161. Epub 2022 Aug 20.

Towards automated clinical coding.迈向自动化临床编码。

Int J Med Inform. 2018 Dec;120:50-61. doi: 10.1016/j.ijmedinf.2018.09.021. Epub 2018 Oct 2.

Classifying social determinants of health from unstructured electronic health records using deep learning-based natural language processing.利用基于深度学习的自然语言处理技术从非结构化电子健康记录中分类社会健康决定因素。

J Biomed Inform. 2022 Mar;127:103984. doi: 10.1016/j.jbi.2021.103984. Epub 2022 Jan 7.

An explainable CNN approach for medical codes prediction from clinical text.一种用于从临床文本预测医疗编码的可解释 CNN 方法。

BMC Med Inform Decis Mak. 2021 Nov 16;21(Suppl 9):256. doi: 10.1186/s12911-021-01615-6.

JLAN: medical code prediction via joint learning attention networks and denoising mechanism.JLAN：基于联合学习注意力网络和去噪机制的医疗编码预测。

BMC Bioinformatics. 2021 Dec 13;22(1):590. doi: 10.1186/s12859-021-04520-x.

Medical code prediction via capsule networks and ICD knowledge.基于胶囊网络和 ICD 知识的医疗编码预测。

BMC Med Inform Decis Mak. 2021 Jul 30;21(Suppl 2):55. doi: 10.1186/s12911-021-01426-9.

An empirical evaluation of deep learning for ICD-9 code assignment using MIMIC-III clinical notes.基于 MIMIC-III 临床记录的深度学习方法在 ICD-9 编码任务中的实证评估

Comput Methods Programs Biomed. 2019 Aug;177:141-153. doi: 10.1016/j.cmpb.2019.05.024. Epub 2019 May 25.

Incorporating medical code descriptions for diagnosis prediction in healthcare.将医疗代码描述纳入医疗保健中的诊断预测。

BMC Med Inform Decis Mak. 2019 Dec 19;19(Suppl 6):267. doi: 10.1186/s12911-019-0961-2.

A comparative study on deep learning models for text classification of unstructured medical notes with various levels of class imbalance.深度学习模型在不同类别不平衡程度的非结构化医疗记录文本分类中的对比研究。

BMC Med Res Methodol. 2022 Jul 2;22(1):181. doi: 10.1186/s12874-022-01665-y.

引用本文的文献

Exploring the consistency, quality and challenges in manual and automated coding of free-text diagnoses from hospital outpatient letters.探索医院门诊信件中自由文本诊断的人工编码和自动编码的一致性、质量及挑战。

PLoS One. 2025 Aug 25;20(8):e0328108. doi: 10.1371/journal.pone.0328108. eCollection 2025.

A feature explainability-based deep learning technique for diabetic foot ulcer identification.一种基于特征可解释性的深度学习技术用于糖尿病足溃疡识别。

Sci Rep. 2025 Feb 25;15(1):6758. doi: 10.1038/s41598-025-90780-z.

Optimising the paradigms of human AI collaborative clinical coding.

使用分层标签分类注意力网络和标签嵌入初始化来实现临床笔记的可解释自动化编码。

Explainable automated coding of clinical notes using hierarchical label-wise attention networks and label embedding initialisation.

机构信息

Centre for Medical Informatics, Usher Institute of Population Health Sciences and Informatics, University of Edinburgh, Edinburgh, United Kingdom; Health Data Research UK, London, United Kingdom.

出版信息

J Biomed Inform. 2021 Apr;116:103728. doi: 10.1016/j.jbi.2021.103728. Epub 2021 Mar 9.

DOI:10.1016/j.jbi.2021.103728

PMID:33711543

Abstract

BACKGROUND

METHODS

RESULTS

CONCLUSION

摘要

Suppr 超能文献

文献检索

文件翻译

深度研究

Suppr 超能文献

文献检索

文件翻译

深度研究

使用分层标签分类注意力网络和标签嵌入初始化来实现临床笔记的可解释自动化编码。

Explainable automated coding of clinical notes using hierarchical label-wise attention networks and label embedding initialisation.

机构信息

出版信息

BACKGROUND

METHODS

RESULTS

CONCLUSION

背景

方法

结果

结论

相似文献

引用本文的文献

使用分层标签分类注意力网络和标签嵌入初始化来实现临床笔记的可解释自动化编码。

Explainable automated coding of clinical notes using hierarchical label-wise attention networks and label embedding initialisation.

机构信息

出版信息

BACKGROUND

METHODS

RESULTS

CONCLUSION

背景

方法

结果

结论

相似文献

引用本文的文献