Annu Int Conf IEEE Eng Med Biol Soc. 2021 Nov;2021:2335-2338. doi: 10.1109/EMBC46164.2021.9629552.
Due to the COronaVIrus Disease 2019 (COVID-19) pandemic, early screening of COVID-19 is essential to prevent its transmission. Detecting COVID-19 with computer audition techniques has in recent studies shown the potential to achieve a fast, cheap, and ecologically friendly diagnosis. Respiratory sounds and speech may contain rich and complementary information about COVID-19 clinical conditions. Therefore, we propose training three deep neural networks on three types of sounds (breathing/counting/vowel) and assembling these models to improve the performance. More specifically, we employ Convolutional Neural Networks (CNNs) to extract spatial representations from log Mel spectrograms and a multi-head attention mechanism in the transformer to mine temporal context information from the CNNs' outputs. The experimental results demonstrate that the transformer-based CNNs can effectively detect COVID-19 on the DiCOVA Track-2 database (AUC: 70.0%) and outperform simple CNNs and hybrid CNN-RNNs.
由于 2019 年冠状病毒病(COVID-19)大流行,早期筛查 COVID-19 对于防止其传播至关重要。最近的研究表明,利用计算机听觉技术检测 COVID-19 具有快速、廉价和环保的诊断潜力。呼吸声和语音可能包含有关 COVID-19 临床状况的丰富且互补的信息。因此,我们提出在三种类型的声音(呼吸/计数/元音)上训练三个深度神经网络,并将这些模型组装起来以提高性能。更具体地说,我们使用卷积神经网络(CNNs)从对数梅尔频谱图中提取空间表示,并在变压器中使用多头注意力机制从 CNN 的输出中挖掘时间上下文信息。实验结果表明,基于变压器的 CNN 可以有效地在 DiCOVA Track-2 数据库上检测 COVID-19(AUC:70.0%),并且优于简单的 CNN 和混合 CNN-RNN。