Suppr超能文献

基于层次连接融合的 TDNN 用于声音事件分类。

Hierarchical-Concatenate Fusion TDNN for sound event classification.

机构信息

School of Information Science and Engineering, Shenyang University of Technology, Shenyang City, Liaoning Province, China.

出版信息

PLoS One. 2024 Oct 31;19(10):e0312998. doi: 10.1371/journal.pone.0312998. eCollection 2024.

Abstract

Semantic feature combination/parsing issue is one of the key problems in sound event classification for acoustic scene analysis, environmental sound monitoring, and urban soundscape analysis. The input audio signal in the acoustic scene classification is composed of multiple acoustic events, which usually leads to low recognition rate in complex environments. To address this issue, this paper proposes the Hierarchical-Concatenate Fusion(HCF)-TDNN model by adding HCF Module to ECAPA-TDNN model for sound event classification. In the HCF module, firstly, the audio signal is converted into two-dimensional time-frequency features for segmentation. Then, the segmented features are convolved one by one for improving the small receptive field in perceiving details. Finally, after the convolution is completed, the two adjacent parts are combined before proceeding with the next convolution for enlarging the receptive field in capturing large targets. Therefore, the improved model further enhances the scalability by emphasizing channel attention and efficient propagation and aggregation of feature information. The proposed model is trained and validated on the Urbansound8K dataset. The experimental results show that the proposed model can achieve the best classification accuracy of 95.83%, which is an approximate improvement of 5% (relatively) over the ECAPA-TDNN model.

摘要

语义特征组合/解析问题是声学场景分析、环境声音监测和城市声音景观分析中声音事件分类的关键问题之一。在声学场景分类中,输入的音频信号由多个声学事件组成,这通常会导致在复杂环境中的识别率较低。针对这个问题,本文通过在 ECAPA-TDNN 模型中添加 HCF 模块,提出了分层连接融合(HCF)-TDNN 模型,用于声音事件分类。在 HCF 模块中,首先将音频信号转换为二维时频特征进行分割。然后,对分割后的特征进行卷积,以提高在感知细节方面的小感受野。最后,卷积完成后,在进行下一次卷积之前,将两个相邻部分合并,以扩大捕获大目标的感受野。因此,改进后的模型通过强调通道注意力和特征信息的高效传播和聚合,进一步提高了可扩展性。该模型在 Urbansound8K 数据集上进行了训练和验证。实验结果表明,所提出的模型可以达到最佳的分类准确率 95.83%,相对于 ECAPA-TDNN 模型,这一准确率提高了约 5%(相对)。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/999f/11527289/3d69a2ad1860/pone.0312998.g001.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验