Suppr超能文献

基于规则的ICD-9-CM编码系统的自动构建。

Automatic construction of rule-based ICD-9-CM coding systems.

作者信息

Farkas Richárd, Szarvas György

机构信息

Research Group on Artificial Intelligence, Hungarian Academy of Sciences, Aradi Vértanúk tere 1, Szeged, Hungary.

出版信息

BMC Bioinformatics. 2008 Apr 11;9 Suppl 3(Suppl 3):S10. doi: 10.1186/1471-2105-9-S3-S10.

Abstract

BACKGROUND

In this paper we focus on the problem of automatically constructing ICD-9-CM coding systems for radiology reports. ICD-9-CM codes are used for billing purposes by health institutes and are assigned to clinical records manually following clinical treatment. Since this labeling task requires expert knowledge in the field of medicine, the process itself is costly and is prone to errors as human annotators have to consider thousands of possible codes when assigning the right ICD-9-CM labels to a document. In this study we use the datasets made available for training and testing automated ICD-9-CM coding systems by the organisers of an International Challenge on Classifying Clinical Free Text Using Natural Language Processing in spring 2007. The challenge itself was dominated by entirely or partly rule-based systems that solve the coding task using a set of hand crafted expert rules. Since the feasibility of the construction of such systems for thousands of ICD codes is indeed questionable, we decided to examine the problem of automatically constructing similar rule sets that turned out to achieve a remarkable accuracy in the shared task challenge.

RESULTS

Our results are very promising in the sense that we managed to achieve comparable results with purely hand-crafted ICD-9-CM classifiers. Our best model got a 90.26% F measure on the training dataset and an 88.93% F measure on the challenge test dataset, using the micro-averaged F beta=1 measure, the official evaluation metric of the International Challenge on Classifying Clinical Free Text Using Natural Language Processing. This result would have placed second in the challenge, with a hand-crafted system achieving slightly better results.

CONCLUSIONS

Our results demonstrate that hand-crafted systems - which proved to be successful in ICD-9-CM coding - can be reproduced by replacing several laborious steps in their construction with machine learning models. These hybrid systems preserve the favourable aspects of rule-based classifiers like good performance, and their development can be achieved rapidly and requires less human effort. Hence the construction of such hybrid systems can be feasible for a set of labels one magnitude bigger, and with more labeled data.

摘要

背景

在本文中,我们关注为放射学报告自动构建ICD - 9 - CM编码系统的问题。ICD - 9 - CM编码被健康机构用于计费目的,并且在临床治疗后由人工手动分配到临床记录中。由于此标注任务需要医学领域的专业知识,该过程本身成本高昂且容易出错,因为人工标注人员在为文档分配正确的ICD - 9 - CM标签时必须考虑数千种可能的编码。在本研究中,我们使用了由2007年春季国际自然语言处理临床自由文本分类挑战赛的组织者提供的用于训练和测试自动ICD - 9 - CM编码系统的数据集。该挑战赛本身主要由完全或部分基于规则的系统主导,这些系统使用一组手工制作的专家规则来解决编码任务。由于构建针对数千个ICD编码的此类系统的可行性确实值得怀疑,我们决定研究自动构建类似规则集的问题,结果发现在共享任务挑战赛中这些规则集能达到显著的准确率。

结果

我们的结果非常有前景,因为我们成功地获得了与纯手工制作的ICD - 9 - CM分类器相当的结果。使用微平均Fβ = 1度量(国际自然语言处理临床自由文本分类挑战赛的官方评估指标),我们最好的模型在训练数据集上的F度量为90.26%,在挑战赛测试数据集上的F度量为88.93%。这个结果在挑战赛中本可以获得第二名,一个手工制作的系统取得了稍好的结果。

结论

我们的结果表明,在ICD - 9 - CM编码中被证明成功的手工制作系统,可以通过用机器学习模型取代其构建过程中的几个繁琐步骤来重现。这些混合系统保留了基于规则的分类器的有利方面,如良好的性能,并且它们的开发可以快速实现,所需人力较少。因此,对于数量级更大的一组标签以及更多的标注数据,构建此类混合系统可能是可行的。

相似文献

1
Automatic construction of rule-based ICD-9-CM coding systems.
BMC Bioinformatics. 2008 Apr 11;9 Suppl 3(Suppl 3):S10. doi: 10.1186/1471-2105-9-S3-S10.
4
Comparison of different feature extraction methods for applicable automated ICD coding.
BMC Med Inform Decis Mak. 2022 Jan 12;22(1):11. doi: 10.1186/s12911-022-01753-5.
6
[Automatic ICD-10 coding : Natural language processing for German MRI reports].
Radiologie (Heidelb). 2024 Oct;64(10):793-800. doi: 10.1007/s00117-024-01349-2. Epub 2024 Aug 9.
10
Evaluating Terminologies to Enable Imaging-Related Decision Rule Sharing.
AMIA Annu Symp Proc. 2017 Feb 10;2016:2082-2089. eCollection 2016.

引用本文的文献

1
Enhancing medical coding efficiency through domain-specific fine-tuned large language models.
Npj Health Syst. 2025;2(1):14. doi: 10.1038/s44401-025-00018-3. Epub 2025 May 1.
2
ICDXML: enhancing ICD coding with probabilistic label trees and dynamic semantic representations.
Sci Rep. 2024 Aug 7;14(1):18319. doi: 10.1038/s41598-024-69214-9.
3
AnEMIC: A Framework for Benchmarking ICD Coding Models.
Proc Conf Empir Methods Nat Lang Process. 2022 Dec;2022(SD):109-120. doi: 10.18653/v1/2022.emnlp-demos.11.
4
Using a Large Open Clinical Corpus for Improved ICD-10 Diagnosis Coding.
AMIA Annu Symp Proc. 2024 Jan 11;2023:465-473. eCollection 2023.
6
Automating the overburdened clinical coding system: challenges and next steps.
NPJ Digit Med. 2023 Feb 3;6(1):16. doi: 10.1038/s41746-023-00768-0.
7
Classification of user queries according to a hierarchical medical procedure encoding system using an ensemble classifier.
Front Artif Intell. 2022 Nov 4;5:1000283. doi: 10.3389/frai.2022.1000283. eCollection 2022.
8
Automated clinical coding: what, why, and where we are?
NPJ Digit Med. 2022 Oct 22;5(1):159. doi: 10.1038/s41746-022-00705-7.
10
Can Natural Language Processing and Artificial Intelligence Automate The Generation of Billing Codes From Operative Note Dictations?
Global Spine J. 2023 Sep;13(7):1946-1955. doi: 10.1177/21925682211062831. Epub 2022 Feb 28.

本文引用的文献

2
A simple algorithm for identifying negated findings and diseases in discharge summaries.
J Biomed Inform. 2001 Oct;34(5):301-10. doi: 10.1006/jbin.2001.1029.

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验