Kim Juyong, Sharma Abheesht, Shanbhogue Suhas, Ravikumar Pradeep, Weiss Jeremy C
Machine Learning Department, Carnegie Mellon University.
Birla Institute of Technology & Science, Pilani - Goa Campus.
Proc Conf Empir Methods Nat Lang Process. 2022 Dec;2022(SD):109-120. doi: 10.18653/v1/2022.emnlp-demos.11.
Diagnostic coding, or ICD coding, is the task of assigning diagnosis codes defined by the ICD (International Classification of Diseases) standard to patient visits based on clinical notes. The current process of manual ICD coding is time-consuming and often error-prone, which suggests the need for automatic ICD coding. However, despite the long history of automatic ICD coding, there have been no standardized frameworks for benchmarking ICD coding models. We open-source an easy-to-use tool named , which provides a streamlined pipeline for preprocessing, training, and evaluating for automatic ICD coding. We correct errors in preprocessing by existing works, and provide key models and weights trained on the correctly preprocessed datasets. We also provide an interactive demo performing real-time inference from custom inputs, and visualizations drawn from explainable AI to analyze the models. We hope the framework helps move the research of ICD coding forward and helps professionals explore the potential of ICD coding. The framework and the associated code are available here.
诊断编码,即国际疾病分类(ICD)编码,是一项根据临床记录为患者就诊分配由ICD(国际疾病分类)标准定义的诊断代码的任务。当前的手动ICD编码过程既耗时又容易出错,这表明需要自动ICD编码。然而,尽管自动ICD编码历史悠久,但一直没有用于对ICD编码模型进行基准测试的标准化框架。我们开源了一个名为 的易于使用的工具,它为自动ICD编码的预处理、训练和评估提供了一个简化的流程。我们纠正了现有工作在预处理中的错误,并提供了在正确预处理的数据集上训练的关键模型和权重。我们还提供了一个交互式演示,可根据自定义输入进行实时推理,并提供从可解释人工智能得出的可视化结果来分析模型。我们希望该框架有助于推动ICD编码的研究,并帮助专业人员探索ICD编码的潜力。该框架及相关代码可在此处获取。