Suppr超能文献

增强临床文本的 ICD-10 编码分配:基于总结的方法。

Enhanced ICD-10 code assignment of clinical texts: A summarization-based approach.

机构信息

College of Biomedical Engineering and Instrument Science, Zhejiang University, Zheda Road, 310027 Hangzhou, Zhejiang Province, China.

Department of Information, Hainan Hospital of Chinese PLA General Hospital, Haitang Bay, 572013 Sanya, Hainan Province, China.

出版信息

Artif Intell Med. 2024 Oct;156:102967. doi: 10.1016/j.artmed.2024.102967. Epub 2024 Aug 20.

Abstract

BACKGROUND

Assigning International Classification of Diseases (ICD) codes to clinical texts is a common and crucial practice in patient classification, hospital management, and further statistics analysis. Current auto-coding methods mainly transfer this task to a multi-label classification problem. Such solutions are suffering from high-dimensional mapping space and excessive redundant information in long clinical texts. To alleviate such a situation, we introduce text summarization methods to the ICD coding regime and apply text matching to select ICD codes.

METHOD

We focus on the tenth revision of the ICD (ICD-10) coding and design a novel summarization-based approach (SuM) with an end-to-end strategy to efficiently assign ICD-10 code to clinical texts. In this approach, a knowledge-guided pointer network is purposed to distill and summarize key information in clinical texts precisely. Then a matching model with matching-aggregation architecture follows to align the summary result with code, tuning the one-vs-all scenario to one-vs-one matching so that the large-label-space obstacle laid in classification approaches would be avoided.

RESULT

The 12,788 ICD-10 coded discharge summaries from a Chinese hospital were collected to evaluate the proposed approach. Compared with existing methods, the purposed model achieves the greatest coding results with Micro AUC of 0.9548, MRR@10 of 0.7977, Precision@10 of 0.0944, and Recall@10 of 0.9439 for the TOP-50 Dataset. Results on the FULL-Dataset remain consistent. Also, the proposed knowledge encoder and applied end-to-end strategy are proven to facilitate the whole model to gain efficacy in selecting the most suitable code.

CONCLUSION

The proposed automatic ICD-10 code assignment approach via text summarization can effectively capture critical messages in long clinical texts and improve the performance of ICD-10 coding of clinical texts.

摘要

背景

将国际疾病分类(ICD)代码分配给临床文本是患者分类、医院管理和进一步统计分析的常见且关键的实践。当前的自动编码方法主要将此任务转换为多标签分类问题。此类解决方案在长临床文本中面临高维映射空间和过多冗余信息的问题。为了缓解这种情况,我们将文本摘要方法引入 ICD 编码方案,并应用文本匹配来选择 ICD 代码。

方法

我们专注于 ICD-10 编码的第十版,设计了一种基于摘要的新方法(SuM),采用端到端策略,有效地将 ICD-10 代码分配给临床文本。在这种方法中,提出了一种知识引导的指针网络,精确地提取和总结临床文本中的关键信息。然后,紧随其后的是一个具有匹配-聚合架构的匹配模型,将摘要结果与代码对齐,将一对一的场景调整为一对一的匹配,从而避免了分类方法中存在的大标签空间障碍。

结果

从一家中国医院收集了 12788 个 ICD-10 编码的出院小结来评估所提出的方法。与现有方法相比,所提出的模型在 TOP-50 数据集上取得了最大的编码结果,微 AUC 为 0.9548,MRR@10 为 0.7977,Precision@10 为 0.0944,Recall@10 为 0.9439,对于 TOP-50 数据集。FULL-Dataset 的结果仍然一致。此外,所提出的知识编码器和应用的端到端策略被证明有助于整个模型有效地选择最合适的代码。

结论

通过文本摘要自动分配 ICD-10 代码的方法可以有效地捕获长临床文本中的关键信息,并提高临床文本 ICD-10 编码的性能。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验