Dentamaro Vincenzo, Giglio Paolo, Impedovo Donato, Moretti Luigi, Pirlo Giuseppe
Università degli studi di Bari "Aldo Moro", Department of Computer Science, via Orabona 4, Bari, 70125, Italy.
Università degli studi di Bari "Aldo Moro", Medical School, Bari, Italy.
Pattern Recognit. 2022 Jul;127:108656. doi: 10.1016/j.patcog.2022.108656. Epub 2022 Mar 15.
This study presents the Auditory Cortex ResNet (AUCO ResNet), it is a biologically inspired deep neural network especially designed for sound classification and more specifically for Covid-19 recognition from audio tracks of coughs and breaths. Differently from other approaches, it can be trained end-to-end thus optimizing (with gradient descent) all the modules of the learning algorithm: mel-like filter design, feature extraction, feature selection, dimensionality reduction and prediction. This neural network includes three attention mechanisms namely the squeeze and excitation mechanism, the convolutional block attention module, and the novel sinusoidal learnable attention. The attention mechanism is able to merge relevant information from activation maps at various levels of the network. The net takes as input raw audio files and it is able to fine tune also the features extraction phase. In fact, a Mel-like filter is designed during the training, thus adapting filter banks on important frequencies. AUCO ResNet has proved to provide state of art results on many datasets. Firstly, it has been tested on many datasets containing Covid-19 cough and breath. This choice is related to the fact that that cough and breath are language independent, allowing for cross dataset tests with generalization aims. These tests demonstrate that the approach can be adopted as a low cost, fast and remote Covid-19 pre-screening tool. The net has also been tested on the famous UrbanSound 8K dataset, achieving state of the art accuracy without any data preprocessing or data augmentation technique.
本研究提出了听觉皮层残差网络(AUCO ResNet),它是一种受生物启发的深度神经网络,特别设计用于声音分类,更具体地说是用于从咳嗽和呼吸的音频轨道中识别新冠病毒。与其他方法不同,它可以进行端到端训练,从而(通过梯度下降)优化学习算法的所有模块:类梅尔滤波器设计、特征提取、特征选择、降维和预测。该神经网络包括三种注意力机制,即挤压与激励机制、卷积块注意力模块和新颖的正弦可学习注意力。注意力机制能够合并网络各级激活图中的相关信息。该网络将原始音频文件作为输入,并且还能够对特征提取阶段进行微调。实际上,在训练过程中设计了一种类梅尔滤波器,从而使滤波器组适应重要频率。AUCO ResNet已被证明在许多数据集上都能提供领先的结果。首先,它在许多包含新冠病毒咳嗽和呼吸的数据集上进行了测试。做出这种选择是因为咳嗽和呼吸与语言无关,这使得可以进行具有泛化目标的跨数据集测试。这些测试表明,该方法可以用作低成本、快速且远程的新冠病毒预筛查工具。该网络还在著名的UrbanSound 8K数据集上进行了测试,在没有任何数据预处理或数据增强技术的情况下达到了领先的准确率。