Solti Imre, Cooke Colin R, Xia Fei, Wurfel Mark M
Department of Medical Education and Biomedical Informatics, University of Washington, Seattle WA.
Proceedings (IEEE Int Conf Bioinformatics Biomed). 2009 Nov;2009:314-319. doi: 10.1109/BIBMW.2009.5332081.
This paper compares the performance of keyword and machine learning-based chest x-ray report classification for Acute Lung Injury (ALI). ALI mortality is approximately 30 percent. High mortality is, in part, a consequence of delayed manual chest x-ray classification. An automated system could reduce the time to recognize ALI and lead to reductions in mortality. For our study, 96 and 857 chest x-ray reports in two corpora were labeled by domain experts for ALI. We developed a keyword and a Maximum Entropy-based classification system. Word unigram and character n-grams provided the features for the machine learning system. The Maximum Entropy algorithm with character 6-gram achieved the highest performance (Recall=0.91, Precision=0.90 and F-measure=0.91) on the 857-report corpus. This study has shown that for the classification of ALI chest x-ray reports, the machine learning approach is superior to the keyword based system and achieves comparable results to highest performing physician annotators.
本文比较了基于关键词和机器学习的急性肺损伤(ALI)胸部X光报告分类的性能。ALI的死亡率约为30%。高死亡率部分是延迟人工胸部X光分类的结果。一个自动化系统可以减少识别ALI的时间并降低死亡率。在我们的研究中,两个语料库中的96份和857份胸部X光报告由领域专家标注了ALI。我们开发了一个基于关键词和最大熵的分类系统。单词一元语法和字符n元语法为机器学习系统提供了特征。基于字符6元语法的最大熵算法在857份报告的语料库上取得了最高性能(召回率=0.91,精确率=0.90,F值=0.91)。这项研究表明,对于ALI胸部X光报告的分类,机器学习方法优于基于关键词的系统,并且取得了与表现最佳的医生标注相当的结果。