Australian Institute of Health Innovation, Macquarie University, Sydney, NSW 2109, Australia.
Sensors (Basel). 2021 May 14;21(10):3434. doi: 10.3390/s21103434.
Audio signal classification finds various applications in detecting and monitoring health conditions in healthcare. Convolutional neural networks (CNN) have produced state-of-the-art results in image classification and are being increasingly used in other tasks, including signal classification. However, audio signal classification using CNN presents various challenges. In image classification tasks, raw images of equal dimensions can be used as a direct input to CNN. Raw time-domain signals, on the other hand, can be of varying dimensions. In addition, the temporal signal often has to be transformed to frequency-domain to reveal unique spectral characteristics, therefore requiring signal transformation. In this work, we overview and benchmark various audio signal representation techniques for classification using CNN, including approaches that deal with signals of different lengths and combine multiple representations to improve the classification accuracy. Hence, this work surfaces important empirical evidence that may guide future works deploying CNN for audio signal classification purposes.
音频信号分类在医疗保健中用于检测和监测健康状况的各种应用中有着广泛的应用。卷积神经网络(CNN)在图像分类方面取得了最先进的成果,并越来越多地被用于包括信号分类在内的其他任务中。然而,使用 CNN 进行音频信号分类存在各种挑战。在图像分类任务中,可以将相同维度的原始图像直接作为 CNN 的输入。另一方面,原始时域信号的维度可能不同。此外,通常需要将时间域信号转换为频域以揭示独特的频谱特征,因此需要进行信号转换。在这项工作中,我们概述和基准测试了使用 CNN 进行分类的各种音频信号表示技术,包括处理不同长度信号的方法以及结合多种表示以提高分类准确性的方法。因此,这项工作提供了重要的经验证据,可能会为未来使用 CNN 进行音频信号分类的工作提供指导。