使用分形维数和视觉Transformer以及基于咳嗽声音的Grad-CAM进行可解释的COVID-19检测

Explainable COVID-19 detection using fractal dimension and vision transformer with Grad-CAM on cough sounds.

作者信息

Sobahi Nebras, Atila Orhan, Deniz Erkan, Sengur Abdulkadir, Acharya U Rajendra

机构信息

King Abdulaziz University, Department of Electrical and Computer Engineering, Jeddah, Saudi Arabia.

Firat University, Technology Faculty, Electrical and Electronics Engineering Department, Elazig, Turkey.

出版信息

Biocybern Biomed Eng. 2022 Jul-Sep;42(3):1066-1080. doi: 10.1016/j.bbe.2022.08.005. Epub 2022 Sep 6.

DOI:10.1016/j.bbe.2022.08.005

PMID:36092540

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC9444505/

Abstract

The polymerase chain reaction (PCR) test is not only time-intensive but also a contact method that puts healthcare personnel at risk. Thus, contactless and fast detection tests are more valuable. Cough sound is an important indicator of COVID-19, and in this paper, a novel explainable scheme is developed for cough sound-based COVID-19 detection. In the presented work, the cough sound is initially segmented into overlapping parts, and each segment is labeled as the input audio, which may contain other sounds. The deep Yet Another Mobile Network (YAMNet) model is considered in this work. After labeling, the segments labeled as cough are cropped and concatenated to reconstruct the pure cough sounds. Then, four fractal dimensions (FD) calculation methods are employed to acquire the FD coefficients on the cough sound with an overlapped sliding window that forms a matrix. The constructed matrixes are then used to form the fractal dimension images. Finally, a pretrained vision transformer (ViT) model is used to classify the constructed images into COVID-19, healthy and symptomatic classes. In this work, we demonstrate the performance of the ViT on cough sound-based COVID-19, and a visual explainability of the inner workings of the ViT model is shown. Three publically available cough sound datasets, namely COUGHVID, VIRUFY, and COSWARA, are used in this study. We have obtained 98.45%, 98.15%, and 97.59% accuracy for COUGHVID, VIRUFY, and COSWARA datasets, respectively. Our developed model obtained the highest performance compared to the state-of-the-art methods and is ready to be tested in real-world applications.

摘要

聚合酶链反应（PCR）检测不仅耗时，而且是一种会让医护人员面临风险的接触式检测方法。因此，非接触式快速检测更为重要。咳嗽声是新冠肺炎的一个重要指标，本文提出了一种基于咳嗽声的新冠肺炎检测新的可解释方案。在本研究中，首先将咳嗽声分割成重叠部分，每个片段作为输入音频进行标注，其中可能包含其他声音。本研究采用了深度“又一个移动网络”（YAMNet）模型。标注后，将标注为咳嗽的片段裁剪并拼接起来，以重建纯净的咳嗽声。然后，采用四种分形维数（FD）计算方法，通过重叠滑动窗口获取咳嗽声的FD系数，形成一个矩阵。然后，用构建的矩阵形成分形维数图像。最后，使用预训练的视觉Transformer（ViT）模型将构建的图像分类为新冠肺炎、健康和有症状类别。在本研究中，我们展示了ViT在基于咳嗽声的新冠肺炎检测中的性能，并展示了ViT模型内部工作原理的可视化可解释性。本研究使用了三个公开可用的咳嗽声数据集，即COUGHVID、VIRUFY和COSWARA。对于COUGHVID、VIRUFY和COSWARA数据集，我们分别获得了98.45%、98.15%和97.59%的准确率。与现有方法相比，我们开发的模型性能最高，并且已准备好在实际应用中进行测试。

Suppr 超能文献

文献检索

文件翻译

深度研究

Suppr 超能文献

文献检索

文件翻译

深度研究

使用分形维数和视觉Transformer以及基于咳嗽声音的Grad-CAM进行可解释的COVID-19检测

Explainable COVID-19 detection using fractal dimension and vision transformer with Grad-CAM on cough sounds.

作者信息

机构信息

出版信息

相似文献

引用本文的文献

使用分形维数和视觉Transformer以及基于咳嗽声音的Grad-CAM进行可解释的COVID-19检测

Explainable COVID-19 detection using fractal dimension and vision transformer with Grad-CAM on cough sounds.

作者信息

机构信息

出版信息

相似文献

引用本文的文献