Castro-Ospina Andrés Eduardo, Solarte-Sanchez Miguel Angel, Vega-Escobar Laura Stella, Isaza Claudia, Martínez-Vargas Juan David
Grupo de Investigación Máquinas Inteligentes y Reconocimiento de Patrones, Instituto Tecnológico Metropolitano, Medellín 050013, Colombia.
SISTEMIC, Electronic Engineering Department, Universidad de Antioquia-UdeA, Medellín 050010, Colombia.
Sensors (Basel). 2024 Mar 26;24(7):2106. doi: 10.3390/s24072106.
Sound classification plays a crucial role in enhancing the interpretation, analysis, and use of acoustic data, leading to a wide range of practical applications, of which environmental sound analysis is one of the most important. In this paper, we explore the representation of audio data as graphs in the context of sound classification. We propose a methodology that leverages pre-trained audio models to extract deep features from audio files, which are then employed as node information to build graphs. Subsequently, we train various graph neural networks (GNNs), specifically graph convolutional networks (GCNs), GraphSAGE, and graph attention networks (GATs), to solve multi-class audio classification problems. Our findings underscore the effectiveness of employing graphs to represent audio data. Moreover, they highlight the competitive performance of GNNs in sound classification endeavors, with the GAT model emerging as the top performer, achieving a mean accuracy of 83% in classifying environmental sounds and 91% in identifying the land cover of a site based on its audio recording. In conclusion, this study provides novel insights into the potential of graph representation learning techniques for analyzing audio data.
声音分类在增强声学数据的解释、分析和利用方面起着至关重要的作用,从而产生了广泛的实际应用,其中环境声音分析是最重要的应用之一。在本文中,我们在声音分类的背景下探索将音频数据表示为图。我们提出了一种方法,利用预训练的音频模型从音频文件中提取深度特征,然后将这些特征用作节点信息来构建图。随后,我们训练各种图神经网络(GNN),特别是图卷积网络(GCN)、GraphSAGE和图注意力网络(GAT),以解决多类音频分类问题。我们的研究结果强调了使用图来表示音频数据的有效性。此外,它们突出了GNN在声音分类任务中的竞争性能,其中GAT模型表现最佳,在对环境声音进行分类时平均准确率达到83%,在根据音频记录识别场地的土地覆盖时达到91%。总之,本研究为图表示学习技术在分析音频数据方面的潜力提供了新的见解。