Farkas Richárd, Szarvas György
Research Group on Artificial Intelligence, Hungarian Academy of Sciences, Aradi Vértanúk tere 1, Szeged, Hungary.
BMC Bioinformatics. 2008 Apr 11;9 Suppl 3(Suppl 3):S10. doi: 10.1186/1471-2105-9-S3-S10.
In this paper we focus on the problem of automatically constructing ICD-9-CM coding systems for radiology reports. ICD-9-CM codes are used for billing purposes by health institutes and are assigned to clinical records manually following clinical treatment. Since this labeling task requires expert knowledge in the field of medicine, the process itself is costly and is prone to errors as human annotators have to consider thousands of possible codes when assigning the right ICD-9-CM labels to a document. In this study we use the datasets made available for training and testing automated ICD-9-CM coding systems by the organisers of an International Challenge on Classifying Clinical Free Text Using Natural Language Processing in spring 2007. The challenge itself was dominated by entirely or partly rule-based systems that solve the coding task using a set of hand crafted expert rules. Since the feasibility of the construction of such systems for thousands of ICD codes is indeed questionable, we decided to examine the problem of automatically constructing similar rule sets that turned out to achieve a remarkable accuracy in the shared task challenge.
Our results are very promising in the sense that we managed to achieve comparable results with purely hand-crafted ICD-9-CM classifiers. Our best model got a 90.26% F measure on the training dataset and an 88.93% F measure on the challenge test dataset, using the micro-averaged F beta=1 measure, the official evaluation metric of the International Challenge on Classifying Clinical Free Text Using Natural Language Processing. This result would have placed second in the challenge, with a hand-crafted system achieving slightly better results.
Our results demonstrate that hand-crafted systems - which proved to be successful in ICD-9-CM coding - can be reproduced by replacing several laborious steps in their construction with machine learning models. These hybrid systems preserve the favourable aspects of rule-based classifiers like good performance, and their development can be achieved rapidly and requires less human effort. Hence the construction of such hybrid systems can be feasible for a set of labels one magnitude bigger, and with more labeled data.
在本文中,我们关注为放射学报告自动构建ICD - 9 - CM编码系统的问题。ICD - 9 - CM编码被健康机构用于计费目的,并且在临床治疗后由人工手动分配到临床记录中。由于此标注任务需要医学领域的专业知识,该过程本身成本高昂且容易出错,因为人工标注人员在为文档分配正确的ICD - 9 - CM标签时必须考虑数千种可能的编码。在本研究中,我们使用了由2007年春季国际自然语言处理临床自由文本分类挑战赛的组织者提供的用于训练和测试自动ICD - 9 - CM编码系统的数据集。该挑战赛本身主要由完全或部分基于规则的系统主导,这些系统使用一组手工制作的专家规则来解决编码任务。由于构建针对数千个ICD编码的此类系统的可行性确实值得怀疑,我们决定研究自动构建类似规则集的问题,结果发现在共享任务挑战赛中这些规则集能达到显著的准确率。
我们的结果非常有前景,因为我们成功地获得了与纯手工制作的ICD - 9 - CM分类器相当的结果。使用微平均Fβ = 1度量(国际自然语言处理临床自由文本分类挑战赛的官方评估指标),我们最好的模型在训练数据集上的F度量为90.26%,在挑战赛测试数据集上的F度量为88.93%。这个结果在挑战赛中本可以获得第二名,一个手工制作的系统取得了稍好的结果。
我们的结果表明,在ICD - 9 - CM编码中被证明成功的手工制作系统,可以通过用机器学习模型取代其构建过程中的几个繁琐步骤来重现。这些混合系统保留了基于规则的分类器的有利方面,如良好的性能,并且它们的开发可以快速实现,所需人力较少。因此,对于数量级更大的一组标签以及更多的标注数据,构建此类混合系统可能是可行的。