Suppr超能文献

多粒度标签预测模型在临床文本自动国际疾病分类编码中的应用

Multigranularity Label Prediction Model for Automatic International Classification of Diseases Coding in Clinical Text.

机构信息

Hunan Provincial Key Laboratory on Bioinformatics, School of Computer Science and Engineering, Central South University, Changsha, P.R. China.

School of Computer Science, University of South China, Hengyang, P.R. China.

出版信息

J Comput Biol. 2023 Aug;30(8):900-911. doi: 10.1089/cmb.2023.0096. Epub 2023 Jul 31.

Abstract

International Classification of Diseases (ICD) serves as the foundation for generating comparable global disease statistics across regions and over time. The process of ICD coding involves assigning codes to diseases based on clinical notes, which can describe a patient's condition in a standard way. However, this process is complicated by the vast number of codes and the intricate taxonomy of ICD codes, which are hierarchically organized into various levels, including chapter, category, subcategory, and its subdivisions. Many existing studies focus solely on predicting subcategory codes, ignoring the hierarchical relationships among codes. To address this limitation, we propose a multitask learning model that trains multiple classifiers for different code levels, while also capturing the relations between coarser and finer-grained labels through a reinforcement mechanism. Our approach is evaluated on both English and Chinese benchmark dataset, and we demonstrate that our method achieves competitive performance with baseline models, particularly in terms of macro-F1 results. These findings suggest that our approach effectively leverages the hierarchical structure of ICD codes to improve disease code prediction accuracy. Analysis of attention mechanism shows that multigranularity attention of our model captures crucial feature of input text on different granularity levels, which can provide reasonable explanations for the prediction results.

摘要

国际疾病分类(ICD)是生成具有可比性的全球疾病统计数据的基础,可用于比较不同地区和不同时间的疾病情况。ICD 编码过程涉及根据临床记录为疾病分配代码,这些代码可以以标准方式描述患者的病情。然而,由于代码数量庞大且 ICD 代码的分类法复杂,这一过程变得复杂,代码按照层次结构组织成不同的级别,包括章节、类别、子类别及其细分。许多现有的研究仅专注于预测子类别代码,而忽略了代码之间的层次关系。为了解决这个局限性,我们提出了一种多任务学习模型,该模型为不同的代码级别训练多个分类器,同时通过强化机制捕捉更粗粒度和更细粒度标签之间的关系。我们在英语和中文基准数据集上评估了我们的方法,并证明我们的方法在基线模型的基础上取得了有竞争力的性能,尤其是在宏观 F1 结果方面。这些发现表明,我们的方法有效地利用了 ICD 代码的层次结构来提高疾病代码预测的准确性。注意力机制分析表明,我们的模型的多粒度注意力可以在不同的粒度级别上捕获输入文本的关键特征,这可以为预测结果提供合理的解释。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验