Li Xiaobo, Zhang Yijia, Hou Xiaodi, Wang Shilong, Lin Hongfei
School of Information Science and Technology, Dalian Maritime University, Dalian, Liaoning, 116026, China.
School of Information Science and Technology, Dalian Maritime University, Dalian, Liaoning, 116026, China.
Artif Intell Med. 2025 Oct;168:103187. doi: 10.1016/j.artmed.2025.103187. Epub 2025 Jul 10.
The automatic International Classification of Diseases (ICD) coding task assigns unique medical codes to diseases in clinical texts for further data statistics, quality control, billing and other tasks. The efficiency and accuracy of medical code assignment is a significant challenge affecting healthcare. However, in clinical practice, Electronic Health Records (EHRs) data are usually complex, heterogeneous, non-standard and unstructured, and the manual coding process is time-consuming, laborious and error-prone. Traditional machine learning methods struggle to extract significant semantic information from clinical texts accurately, but the latest progress in Deep Learning (DL) has shown promising results to address these issues.
This paper comprehensively reviewed recent advancements in utilizing deep learning for automatic ICD coding, which aimed to reveal prominent challenges and emerging development trends by summarizing and analyzing the model's year, design motivation, deep neural networks, and auxiliary data.
This review introduced systematic literature on automatic ICD coding methods based on deep learning. We screened 5 online databases, including Web of Science, SpringerLink, PubMed, ACM, and IEEE digital library, and collected 53 published articles related to deep learning-based ICD coding from 2017 to 2023.
These deep neural network methods aimed to overcome some challenges, such as lengthy and noisy clinical text, high dimensionality and functional relationships of medical codes, and long-tail label distribution. The Convolutional Neural Networks (CNNs), Recurrent Neural Networks (RNNs), attention mechanisms, Transformers, Pre-trained Language Models (PLMs), etc, have become popular to address prominent issues in ICD coding. Meanwhile, introducing medical ontology within the ICD coding system (code description and code hierarchy) and external knowledge (Wikipedia articles, tabular data, Clinical Classification Software (CCS), fine-tuning PLMs based on biomedical corpus, entity recognition and concept extraction) has become an emerging trend for automatic ICD coding.
This paper provided a comprehensive review of recent literature on applying deep learning technology to improve medical code assignment from a unique perspective. Multiple neural network methods (CNNs, RNNs, Transformers, PLMs, especially attention mechanisms) have been successfully applied in ICD tasks and achieved excellent performance. Various medical auxiliary data has also proven valuable in enhancing model feature representation and classification performance. Our in-depth and systematic analysis suggested that the automatic ICD coding method based on deep learning has a bright future in healthcare. Finally, we discussed some major challenges and outlined future development directions.
自动国际疾病分类(ICD)编码任务为临床文本中的疾病分配唯一的医学代码,以用于进一步的数据统计、质量控制、计费及其他任务。医学代码分配的效率和准确性是影响医疗保健的一项重大挑战。然而,在临床实践中,电子健康记录(EHR)数据通常复杂、异构、不标准且无结构,手动编码过程耗时、费力且容易出错。传统机器学习方法难以从临床文本中准确提取重要的语义信息,但深度学习(DL)的最新进展已显示出解决这些问题的良好前景。
本文全面回顾了利用深度学习进行自动ICD编码的最新进展,旨在通过总结和分析模型的年份、设计动机、深度神经网络及辅助数据,揭示突出挑战和新兴发展趋势。
本综述介绍了基于深度学习的自动ICD编码方法的系统文献。我们筛选了5个在线数据库,包括科学网、施普林格链接、PubMed、美国计算机协会和电气与电子工程师协会数字图书馆,并收集了2017年至2023年期间53篇与基于深度学习的ICD编码相关的已发表文章。
这些深度神经网络方法旨在克服一些挑战,如冗长且有噪声的临床文本、医学代码的高维度和功能关系以及长尾标签分布。卷积神经网络(CNN)、循环神经网络(RNN)、注意力机制、Transformer、预训练语言模型(PLM)等已广泛用于解决ICD编码中的突出问题。同时,在ICD编码系统(代码描述和代码层次结构)中引入医学本体以及外部知识(维基百科文章、表格数据、临床分类软件(CCS)、基于生物医学语料库微调PLM、实体识别和概念提取)已成为自动ICD编码的一个新兴趋势。
本文从独特视角对应用深度学习技术改进医学代码分配方面的近期文献进行了全面综述。多种神经网络方法(CNN、RNN、Transformer、PLM,尤其是注意力机制)已成功应用于ICD任务并取得了优异性能。各种医学辅助数据在增强模型特征表示和分类性能方面也已证明具有价值。我们深入且系统的分析表明,基于深度学习的自动ICD编码方法在医疗保健领域有着光明的前景。最后,我们讨论了一些主要挑战并概述了未来的发展方向。