贫血：一种用于对ICD编码模型进行基准测试的框架。

AnEMIC: A Framework for Benchmarking ICD Coding Models.

作者信息

Kim Juyong, Sharma Abheesht, Shanbhogue Suhas, Ravikumar Pradeep, Weiss Jeremy C

机构信息

Machine Learning Department, Carnegie Mellon University.

Birla Institute of Technology & Science, Pilani - Goa Campus.

出版信息

Proc Conf Empir Methods Nat Lang Process. 2022 Dec;2022(SD):109-120. doi: 10.18653/v1/2022.emnlp-demos.11.

DOI:10.18653/v1/2022.emnlp-demos.11

PMID:38476318

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC10929571/

Abstract

Diagnostic coding, or ICD coding, is the task of assigning diagnosis codes defined by the ICD (International Classification of Diseases) standard to patient visits based on clinical notes. The current process of manual ICD coding is time-consuming and often error-prone, which suggests the need for automatic ICD coding. However, despite the long history of automatic ICD coding, there have been no standardized frameworks for benchmarking ICD coding models. We open-source an easy-to-use tool named , which provides a streamlined pipeline for preprocessing, training, and evaluating for automatic ICD coding. We correct errors in preprocessing by existing works, and provide key models and weights trained on the correctly preprocessed datasets. We also provide an interactive demo performing real-time inference from custom inputs, and visualizations drawn from explainable AI to analyze the models. We hope the framework helps move the research of ICD coding forward and helps professionals explore the potential of ICD coding. The framework and the associated code are available here.

摘要

诊断编码，即国际疾病分类（ICD）编码，是一项根据临床记录为患者就诊分配由ICD（国际疾病分类）标准定义的诊断代码的任务。当前的手动ICD编码过程既耗时又容易出错，这表明需要自动ICD编码。然而，尽管自动ICD编码历史悠久，但一直没有用于对ICD编码模型进行基准测试的标准化框架。我们开源了一个名为的易于使用的工具，它为自动ICD编码的预处理、训练和评估提供了一个简化的流程。我们纠正了现有工作在预处理中的错误，并提供了在正确预处理的数据集上训练的关键模型和权重。我们还提供了一个交互式演示，可根据自定义输入进行实时推理，并提供从可解释人工智能得出的可视化结果来分析模型。我们希望该框架有助于推动ICD编码的研究，并帮助专业人员探索ICD编码的潜力。该框架及相关代码可在此处获取。

相似文献

AnEMIC: A Framework for Benchmarking ICD Coding Models.

Proc Conf Empir Methods Nat Lang Process. 2022 Dec;2022(SD):109-120. doi: 10.18653/v1/2022.emnlp-demos.11.

An explainable CNN approach for medical codes prediction from clinical text.

BMC Med Inform Decis Mak. 2021 Nov 16;21(Suppl 9):256. doi: 10.1186/s12911-021-01615-6.

Automatic International Classification of Diseases Coding System: Deep Contextualized Language Model With Rule-Based Approaches.

JMIR Med Inform. 2022 Jun 29;10(6):e37557. doi: 10.2196/37557.

How to leverage large language models for automatic ICD coding.

Comput Biol Med. 2025 May;189:109971. doi: 10.1016/j.compbiomed.2025.109971. Epub 2025 Mar 14.

Construction of a semi-automatic ICD-10 coding system.

BMC Med Inform Decis Mak. 2020 Apr 15;20(1):67. doi: 10.1186/s12911-020-1085-4.

Autonomous International Classification of Diseases Coding Using Pretrained Language Models and Advanced Prompt Learning Techniques: Evaluation of an Automated Analysis System Using Medical Text.

JMIR Med Inform. 2025 Jan 6;13:e63020. doi: 10.2196/63020.

Improving Quality of ICD-10 (International Statistical Classification of Diseases, Tenth Revision) Coding Using AI: Protocol for a Crossover Randomized Controlled Trial.

JMIR Res Protoc. 2024 Mar 12;13:e54593. doi: 10.2196/54593.

Evaluating a Natural Language Processing-Driven, AI-Assisted International Classification of Diseases, 10th Revision, Clinical Modification, Coding System for Diagnosis Related Groups in a Real Hospital Environment: Algorithm Development and Validation Study.

J Med Internet Res. 2024 Sep 20;26:e58278. doi: 10.2196/58278.

Automatic ICD-10 Coding and Training System: Deep Neural Network Based on Supervised Learning.

JMIR Med Inform. 2021 Aug 31;9(8):e23230. doi: 10.2196/23230.

Neural transfer learning for assigning diagnosis codes to EMRs.

Artif Intell Med. 2019 May;96:116-122. doi: 10.1016/j.artmed.2019.04.002. Epub 2019 Apr 12.

本文引用的文献

Does the magic of BERT apply to medical code assignment? A quantitative study.

Comput Biol Med. 2021 Dec;139:104998. doi: 10.1016/j.compbiomed.2021.104998. Epub 2021 Oct 30.

ICD Coding from Clinical Text Using Multi-Filter Residual Convolutional Neural Network.

Proc AAAI Conf Artif Intell. 2020 Feb;34(5):8180-8187. doi: 10.1609/aaai.v34i05.6331. Epub 2020 Apr 3.

Explainable Prediction of Medical Codes With Knowledge Graphs.

Front Bioeng Biotechnol. 2020 Aug 14;8:867. doi: 10.3389/fbioe.2020.00867. eCollection 2020.

Automatic ICD code assignment of Chinese clinical notes based on multilayer attention BiRNN.

J Biomed Inform. 2019 Mar;91:103114. doi: 10.1016/j.jbi.2019.103114. Epub 2019 Feb 12.

Administrative Costs Associated With Physician Billing and Insurance-Related Activities at an Academic Health Care System.

JAMA. 2018 Feb 20;319(7):691-697. doi: 10.1001/jama.2017.19148.

MIMIC-III, a freely accessible critical care database.

Sci Data. 2016 May 24;3:160035. doi: 10.1038/sdata.2016.35.

Diagnosis code assignment: models and evaluation metrics.

J Am Med Inform Assoc. 2014 Mar-Apr;21(2):231-7. doi: 10.1136/amiajnl-2013-002159. Epub 2013 Dec 2.

Automatic construction of rule-based ICD-9-CM coding systems.

BMC Bioinformatics. 2008 Apr 11;9 Suppl 3(Suppl 3):S10. doi: 10.1186/1471-2105-9-S3-S10.

Implications of fraud and abuse in interventional pain management.

Pain Physician. 2002 Jul;5(3):320-37.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

贫血：一种用于对ICD编码模型进行基准测试的框架。

AnEMIC: A Framework for Benchmarking ICD Coding Models.

作者信息

机构信息

出版信息

相似文献

本文引用的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献