Instituto Federal de Educação, Ciência e Tecnologia do Paraná (IFPR), Pinhais, PR, Brazil; Pontifícia Universidade Catalica do Paraná (PUCPR), Curitiba, PR, Brazil.
Universidade Tecnologica Federal do Paraná (UTFPR), Campo Mourão, PR, Brazil; Universidade Estadual de Maringá (UEM), Maringá, PR, Brazil.
Comput Methods Programs Biomed. 2020 Oct;194:105532. doi: 10.1016/j.cmpb.2020.105532. Epub 2020 May 8.
The COVID-19 can cause severe pneumonia and is estimated to have a high impact on the healthcare system. Early diagnosis is crucial for correct treatment in order to possibly reduce the stress in the healthcare system. The standard image diagnosis tests for pneumonia are chest X-ray (CXR) and computed tomography (CT) scan. Although CT scan is the gold standard, CXR are still useful because it is cheaper, faster and more widespread. This study aims to identify pneumonia caused by COVID-19 from other types and also healthy lungs using only CXR images.
In order to achieve the objectives, we have proposed a classification schema considering the following perspectives: i) a multi-class classification; ii) hierarchical classification, since pneumonia can be structured as a hierarchy. Given the natural data imbalance in this domain, we also proposed the use of resampling algorithms in the schema in order to re-balance the classes distribution. We observed that, texture is one of the main visual attributes of CXR images, our classification schema extract features using some well-known texture descriptors and also using a pre-trained CNN model. We also explored early and late fusion techniques in the schema in order to leverage the strength of multiple texture descriptors and base classifiers at once. To evaluate the approach, we composed a database, named RYDLS-20, containing CXR images of pneumonia caused by different pathogens as well as CXR images of healthy lungs. The classes distribution follows a real-world scenario in which some pathogens are more common than others.
The proposed approach tested in RYDLS-20 achieved a macro-avg F1-Score of 0.65 using a multi-class approach and a F1-Score of 0.89 for the COVID-19 identification in the hierarchical classification scenario.
As far as we know, the top identification rate obtained in this paper is the best nominal rate obtained for COVID-19 identification in an unbalanced environment with more than three classes. We must also highlight the novel proposed hierarchical classification approach for this task, which considers the types of pneumonia caused by the different pathogens and lead us to the best COVID-19 recognition rate obtained here.
新冠病毒可引起严重肺炎,预计对医疗体系有较大影响。早期诊断对正确治疗至关重要,以便可能减轻医疗体系的压力。肺炎的标准影像学诊断检查为胸部 X 光(CXR)和计算机断层扫描(CT)。尽管 CT 扫描是金标准,但 CXR 仍具有价值,因为其更便宜、更快且更普及。本研究旨在仅使用 CXR 图像,从其他类型肺炎和健康肺中识别出由新冠病毒引起的肺炎。
为了实现目标,我们提出了一种分类方案,考虑了以下方面:i)多类分类;ii)层次分类,因为肺炎可以分层结构。鉴于该领域数据天然不平衡,我们还提出在方案中使用重采样算法来重新平衡类别的分布。我们观察到,纹理是 CXR 图像的主要视觉属性之一,我们的分类方案使用一些知名的纹理描述符以及预训练的 CNN 模型来提取特征。我们还在方案中探索了早期和晚期融合技术,以一次利用多个纹理描述符和基础分类器的优势。为了评估该方法,我们构建了一个名为 RYDLS-20 的数据库,其中包含了由不同病原体引起的肺炎和健康肺的 CXR 图像。类别的分布遵循实际情况,即一些病原体比其他病原体更常见。
在 RYDLS-20 中测试的所提出的方法在多类方法中实现了 0.65 的宏平均 F1-分数,在层次分类场景中实现了 0.89 的 COVID-19 识别 F1-分数。
据我们所知,本文获得的最高识别率是在具有三类以上的不平衡环境中对 COVID-19 识别获得的最佳标称率。我们还必须强调为此任务提出的新颖层次分类方法,它考虑了由不同病原体引起的不同类型肺炎,并导致我们在这里获得了最佳的 COVID-19 识别率。