使用预训练模型和图神经网络的基于图的音频分类

Graph-Based Audio Classification Using Pre-Trained Models and Graph Neural Networks.

作者信息

Castro-Ospina Andrés Eduardo, Solarte-Sanchez Miguel Angel, Vega-Escobar Laura Stella, Isaza Claudia, Martínez-Vargas Juan David

机构信息

Grupo de Investigación Máquinas Inteligentes y Reconocimiento de Patrones, Instituto Tecnológico Metropolitano, Medellín 050013, Colombia.

SISTEMIC, Electronic Engineering Department, Universidad de Antioquia-UdeA, Medellín 050010, Colombia.

出版信息

Sensors (Basel). 2024 Mar 26;24(7):2106. doi: 10.3390/s24072106.

DOI:10.3390/s24072106

PMID:38610318

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC11014159/

Abstract

Sound classification plays a crucial role in enhancing the interpretation, analysis, and use of acoustic data, leading to a wide range of practical applications, of which environmental sound analysis is one of the most important. In this paper, we explore the representation of audio data as graphs in the context of sound classification. We propose a methodology that leverages pre-trained audio models to extract deep features from audio files, which are then employed as node information to build graphs. Subsequently, we train various graph neural networks (GNNs), specifically graph convolutional networks (GCNs), GraphSAGE, and graph attention networks (GATs), to solve multi-class audio classification problems. Our findings underscore the effectiveness of employing graphs to represent audio data. Moreover, they highlight the competitive performance of GNNs in sound classification endeavors, with the GAT model emerging as the top performer, achieving a mean accuracy of 83% in classifying environmental sounds and 91% in identifying the land cover of a site based on its audio recording. In conclusion, this study provides novel insights into the potential of graph representation learning techniques for analyzing audio data.

摘要

声音分类在增强声学数据的解释、分析和利用方面起着至关重要的作用，从而产生了广泛的实际应用，其中环境声音分析是最重要的应用之一。在本文中，我们在声音分类的背景下探索将音频数据表示为图。我们提出了一种方法，利用预训练的音频模型从音频文件中提取深度特征，然后将这些特征用作节点信息来构建图。随后，我们训练各种图神经网络（GNN），特别是图卷积网络（GCN）、GraphSAGE和图注意力网络（GAT），以解决多类音频分类问题。我们的研究结果强调了使用图来表示音频数据的有效性。此外，它们突出了GNN在声音分类任务中的竞争性能，其中GAT模型表现最佳，在对环境声音进行分类时平均准确率达到83%，在根据音频记录识别场地的土地覆盖时达到91%。总之，本研究为图表示学习技术在分析音频数据方面的潜力提供了新的见解。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0b2e/11014159/b017de45f5d9/sensors-24-02106-g001.jpg

相似文献

Graph-Based Audio Classification Using Pre-Trained Models and Graph Neural Networks.

Sensors (Basel). 2024 Mar 26;24(7):2106. doi: 10.3390/s24072106.

A simple and effective convolutional operator for node classification without features by graph convolutional networks.

PLoS One. 2024 Apr 30;19(4):e0301476. doi: 10.1371/journal.pone.0301476. eCollection 2024.

Semisupervised Graph Neural Networks for Graph Classification.

IEEE Trans Cybern. 2023 Oct;53(10):6222-6235. doi: 10.1109/TCYB.2022.3164696. Epub 2023 Sep 15.

Finding core labels for maximizing generalization of graph neural networks.

Neural Netw. 2024 Dec;180:106635. doi: 10.1016/j.neunet.2024.106635. Epub 2024 Aug 14.

Locality preserving dense graph convolutional networks with graph context-aware node representations.

Neural Netw. 2021 Nov;143:108-120. doi: 10.1016/j.neunet.2021.05.031. Epub 2021 Jun 2.

Graph convolutional networks: a comprehensive review.

Comput Soc Netw. 2019;6(1):11. doi: 10.1186/s40649-019-0069-y. Epub 2019 Nov 10.

Beyond low-pass filtering on large-scale graphs via Adaptive Filtering Graph Neural Networks.

Neural Netw. 2024 Jan;169:1-10. doi: 10.1016/j.neunet.2023.09.042. Epub 2023 Oct 11.

Boosting-GNN: Boosting Algorithm for Graph Networks on Imbalanced Node Classification.

Front Neurorobot. 2021 Nov 25;15:775688. doi: 10.3389/fnbot.2021.775688. eCollection 2021.

MGLNN: Semi-supervised learning via Multiple Graph Cooperative Learning Neural Networks.

Neural Netw. 2022 Sep;153:204-214. doi: 10.1016/j.neunet.2022.05.024. Epub 2022 Jun 3.

Path-enhanced graph convolutional networks for node classification without features.

PLoS One. 2023 Jun 9;18(6):e0287001. doi: 10.1371/journal.pone.0287001. eCollection 2023.

引用本文的文献

Research on Acoustic Scene Classification Based on Time-Frequency-Wavelet Fusion Network.

Sensors (Basel). 2025 Jun 24;25(13):3930. doi: 10.3390/s25133930.

Impact of transfer learning methods and dataset characteristics on generalization in birdsong classification.

Sci Rep. 2025 May 9;15(1):16273. doi: 10.1038/s41598-025-00996-2.

本文引用的文献

Everything is connected: Graph neural networks.

Curr Opin Struct Biol. 2023 Apr;79:102538. doi: 10.1016/j.sbi.2023.102538. Epub 2023 Feb 9.

Transformers for Urban Sound Classification-A Comprehensive Performance Evaluation.

Sensors (Basel). 2022 Nov 16;22(22):8874. doi: 10.3390/s22228874.

A Comprehensive Survey on Community Detection With Deep Learning.

IEEE Trans Neural Netw Learn Syst. 2024 Apr;35(4):4682-4702. doi: 10.1109/TNNLS.2021.3137396. Epub 2024 Apr 4.

A Comprehensive Survey on Graph Neural Networks.

IEEE Trans Neural Netw Learn Syst. 2021 Jan;32(1):4-24. doi: 10.1109/TNNLS.2020.2978386. Epub 2021 Jan 4.

A CNN-Assisted Enhanced Audio Signal Processing for Speech Emotion Recognition.

Sensors (Basel). 2019 Dec 28;20(1):183. doi: 10.3390/s20010183.

Environment Sound Classification Using a Two-Stream CNN Based on Decision-Level Fusion.

Sensors (Basel). 2019 Apr 11;19(7):1733. doi: 10.3390/s19071733.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

使用预训练模型和图神经网络的基于图的音频分类

Graph-Based Audio Classification Using Pre-Trained Models and Graph Neural Networks.

作者信息

Castro-Ospina Andrés Eduardo, Solarte-Sanchez Miguel Angel, Vega-Escobar Laura Stella, Isaza Claudia, Martínez-Vargas Juan David

机构信息

Grupo de Investigación Máquinas Inteligentes y Reconocimiento de Patrones, Instituto Tecnológico Metropolitano, Medellín 050013, Colombia.

SISTEMIC, Electronic Engineering Department, Universidad de Antioquia-UdeA, Medellín 050010, Colombia.

出版信息

Sensors (Basel). 2024 Mar 26;24(7):2106. doi: 10.3390/s24072106.

DOI:10.3390/s24072106

PMID:38610318

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC11014159/

Abstract

摘要

使用预训练模型和图神经网络的基于图的音频分类

Graph-Based Audio Classification Using Pre-Trained Models and Graph Neural Networks.

作者信息

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

使用预训练模型和图神经网络的基于图的音频分类

Graph-Based Audio Classification Using Pre-Trained Models and Graph Neural Networks.

作者信息

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献