Suppr超能文献

基于分层标签注意力转换器模型的可解释 ICD 编码。

Hierarchical label-wise attention transformer model for explainable ICD coding.

机构信息

Centre for Big Data Research in Health, University of New South Wales, Sydney, Australia.

Centre for Big Data Research in Health, University of New South Wales, Sydney, Australia.

出版信息

J Biomed Inform. 2022 Sep;133:104161. doi: 10.1016/j.jbi.2022.104161. Epub 2022 Aug 20.

Abstract

International Classification of Diseases (ICD) coding plays an important role in systematically classifying morbidity and mortality data. In this study, we propose a hierarchical label-wise attention Transformer model (HiLAT) for the explainable prediction of ICD codes from clinical documents. HiLAT firstly fine-tunes a pretrained Transformer model to represent the tokens of clinical documents. We subsequently employ a two-level hierarchical label-wise attention mechanism that creates label-specific document representations. These representations are in turn used by a feed-forward neural network to predict whether a specific ICD code is assigned to the input clinical document of interest. We evaluate HiLAT using hospital discharge summaries and their corresponding ICD-9 codes from the MIMIC-III database. To investigate the performance of different types of Transformer models, we develop ClinicalplusXLNet, which conducts continual pretraining from XLNet-Base using all the MIMIC-III clinical notes. The experiment results show that the F1 scores of the HiLAT + ClinicalplusXLNet outperform the previous state-of-the-art models for the top-50 most frequent ICD-9 codes from MIMIC-III. Visualisations of attention weights present a potential explainability tool for checking the face validity of ICD code predictions.

摘要

国际疾病分类(ICD)编码在系统分类发病率和死亡率数据方面发挥着重要作用。在这项研究中,我们提出了一种分层标签式注意力转换器模型(HiLAT),用于从临床文档中可解释地预测 ICD 编码。HiLAT 首先微调一个预先训练的转换器模型来表示临床文档的标记。然后,我们采用两级分层标签式注意力机制来创建特定标签的文档表示。这些表示随后由前馈神经网络使用,以预测特定的 ICD 代码是否分配给感兴趣的输入临床文档。我们使用来自 MIMIC-III 数据库的住院小结及其相应的 ICD-9 代码来评估 HiLAT。为了研究不同类型的转换器模型的性能,我们开发了 ClinicalplusXLNet,它使用所有的 MIMIC-III 临床记录对 XLNet-Base 进行持续预训练。实验结果表明,HiLAT + ClinicalplusXLNet 的 F1 分数优于之前的最先进模型,用于 MIMIC-III 中前 50 个最常见的 ICD-9 代码。注意力权重的可视化提供了一种潜在的可解释性工具,用于检查 ICD 编码预测的表面有效性。

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验