• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

基于层次连接融合的 TDNN 用于声音事件分类。

Hierarchical-Concatenate Fusion TDNN for sound event classification.

机构信息

School of Information Science and Engineering, Shenyang University of Technology, Shenyang City, Liaoning Province, China.

出版信息

PLoS One. 2024 Oct 31;19(10):e0312998. doi: 10.1371/journal.pone.0312998. eCollection 2024.

DOI:10.1371/journal.pone.0312998
PMID:39480755
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC11527289/
Abstract

Semantic feature combination/parsing issue is one of the key problems in sound event classification for acoustic scene analysis, environmental sound monitoring, and urban soundscape analysis. The input audio signal in the acoustic scene classification is composed of multiple acoustic events, which usually leads to low recognition rate in complex environments. To address this issue, this paper proposes the Hierarchical-Concatenate Fusion(HCF)-TDNN model by adding HCF Module to ECAPA-TDNN model for sound event classification. In the HCF module, firstly, the audio signal is converted into two-dimensional time-frequency features for segmentation. Then, the segmented features are convolved one by one for improving the small receptive field in perceiving details. Finally, after the convolution is completed, the two adjacent parts are combined before proceeding with the next convolution for enlarging the receptive field in capturing large targets. Therefore, the improved model further enhances the scalability by emphasizing channel attention and efficient propagation and aggregation of feature information. The proposed model is trained and validated on the Urbansound8K dataset. The experimental results show that the proposed model can achieve the best classification accuracy of 95.83%, which is an approximate improvement of 5% (relatively) over the ECAPA-TDNN model.

摘要

语义特征组合/解析问题是声学场景分析、环境声音监测和城市声音景观分析中声音事件分类的关键问题之一。在声学场景分类中,输入的音频信号由多个声学事件组成,这通常会导致在复杂环境中的识别率较低。针对这个问题,本文通过在 ECAPA-TDNN 模型中添加 HCF 模块,提出了分层连接融合(HCF)-TDNN 模型,用于声音事件分类。在 HCF 模块中,首先将音频信号转换为二维时频特征进行分割。然后,对分割后的特征进行卷积,以提高在感知细节方面的小感受野。最后,卷积完成后,在进行下一次卷积之前,将两个相邻部分合并,以扩大捕获大目标的感受野。因此,改进后的模型通过强调通道注意力和特征信息的高效传播和聚合,进一步提高了可扩展性。该模型在 Urbansound8K 数据集上进行了训练和验证。实验结果表明,所提出的模型可以达到最佳的分类准确率 95.83%,相对于 ECAPA-TDNN 模型,这一准确率提高了约 5%(相对)。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/999f/11527289/d2abc440b6e3/pone.0312998.g008.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/999f/11527289/3d69a2ad1860/pone.0312998.g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/999f/11527289/6448c2e4461e/pone.0312998.g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/999f/11527289/eed85884b28c/pone.0312998.g003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/999f/11527289/92c1c3b65bf4/pone.0312998.g004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/999f/11527289/7e5154efe3d2/pone.0312998.g005.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/999f/11527289/5165588d8b14/pone.0312998.g006.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/999f/11527289/b33485d0e5b3/pone.0312998.g007.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/999f/11527289/d2abc440b6e3/pone.0312998.g008.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/999f/11527289/3d69a2ad1860/pone.0312998.g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/999f/11527289/6448c2e4461e/pone.0312998.g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/999f/11527289/eed85884b28c/pone.0312998.g003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/999f/11527289/92c1c3b65bf4/pone.0312998.g004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/999f/11527289/7e5154efe3d2/pone.0312998.g005.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/999f/11527289/5165588d8b14/pone.0312998.g006.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/999f/11527289/b33485d0e5b3/pone.0312998.g007.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/999f/11527289/d2abc440b6e3/pone.0312998.g008.jpg

相似文献

1
Hierarchical-Concatenate Fusion TDNN for sound event classification.基于层次连接融合的 TDNN 用于声音事件分类。
PLoS One. 2024 Oct 31;19(10):e0312998. doi: 10.1371/journal.pone.0312998. eCollection 2024.
2
Deep neural architectures for dialect classification with single frequency filtering and zero-time windowing feature representations.用于方言分类的深度神经架构,具有单频滤波和零时间窗特征表示。
J Acoust Soc Am. 2022 Feb;151(2):1077. doi: 10.1121/10.0009405.
3
An Incremental Class-Learning Approach with Acoustic Novelty Detection for Acoustic Event Recognition.基于声学新颖性检测的增量式类学习方法在声学事件识别中的应用。
Sensors (Basel). 2021 Oct 5;21(19):6622. doi: 10.3390/s21196622.
4
Polyphonic Sound Event Detection Using Temporal-Frequency Attention and Feature Space Attention.基于时频注意力和特征空间注意力的复音声音事件检测。
Sensors (Basel). 2022 Sep 9;22(18):6818. doi: 10.3390/s22186818.
5
Sound Event Detection by Pseudo-Labeling in Weakly Labeled Dataset.基于弱标注数据集的伪标注声事件检测。
Sensors (Basel). 2021 Dec 15;21(24):8375. doi: 10.3390/s21248375.
6
Environmental sound classification using temporal-frequency attention based convolutional neural network.基于时频注意力的卷积神经网络的环境声音分类。
Sci Rep. 2021 Nov 3;11(1):21552. doi: 10.1038/s41598-021-01045-4.
7
High Accurate Environmental Sound Classification: Sub-Spectrogram Segmentation versus Temporal-Frequency Attention Mechanism.高精度环境声音分类:子频谱分段与时频注意力机制。
Sensors (Basel). 2021 Aug 16;21(16):5500. doi: 10.3390/s21165500.
8
Binaural Acoustic Scene Classification Using Wavelet Scattering, Parallel Ensemble Classifiers and Nonlinear Fusion.基于子波散射、并行集成分类器和非线性融合的双耳声场景分类。
Sensors (Basel). 2022 Feb 16;22(4):1535. doi: 10.3390/s22041535.
9
Clustering by Errors: A Self-Organized Multitask Learning Method for Acoustic Scene Classification.基于错误的聚类:用于声场景分类的自组织多任务学习方法。
Sensors (Basel). 2021 Dec 22;22(1):36. doi: 10.3390/s22010036.
10
Attention Based Convolutional Neural Network with Multi-frequency Resolution Feature for Environment Sound Classification.基于注意力机制的具有多频率分辨率特征的卷积神经网络用于环境声音分类
Neural Process Lett. 2022 Oct 24:1-16. doi: 10.1007/s11063-022-11041-y.

本文引用的文献

1
Harnessing ResNet50 and SENet for enhanced ankle fracture identification.利用 ResNet50 和 SENet 进行增强的踝关节骨折识别。
BMC Musculoskelet Disord. 2024 Apr 1;25(1):250. doi: 10.1186/s12891-024-07355-8.
2
Small object detection algorithm incorporating swin transformer for tea buds.用于茶芽的融合 Swin 变换小目标检测算法。
PLoS One. 2024 Mar 21;19(3):e0299902. doi: 10.1371/journal.pone.0299902. eCollection 2024.
3
Research on Pig Sound Recognition Based on Deep Neural Network and Hidden Markov Models.基于深度神经网络和隐马尔可夫模型的猪声识别研究。
Sensors (Basel). 2024 Feb 16;24(4):1269. doi: 10.3390/s24041269.
4
Speech emotion recognition using machine learning techniques: Feature extraction and comparison of convolutional neural network and random forest.基于机器学习技术的语音情感识别:卷积神经网络和随机森林的特征提取与比较。
PLoS One. 2023 Nov 21;18(11):e0291500. doi: 10.1371/journal.pone.0291500. eCollection 2023.
5
A Novel Bird Sound Recognition Method Based on Multifeature Fusion and a Transformer Encoder.基于多特征融合和 Transformer 编码器的新型鸟鸣识别方法。
Sensors (Basel). 2023 Sep 27;23(19):8099. doi: 10.3390/s23198099.
6
An Automatic Classification System for Environmental Sound in Smart Cities.智能城市中环境声音的自动分类系统
Sensors (Basel). 2023 Jul 31;23(15):6823. doi: 10.3390/s23156823.
7
HS-Vectors: Heart Sound Embeddings for Abnormal Heart Sound Detection Based on Time-Compressed and Frequency-Expanded TDNN With Dynamic Mask Encoder.HS向量:基于具有动态掩码编码器的时间压缩和频率扩展TDNN的异常心音检测的心音嵌入
IEEE J Biomed Health Inform. 2023 Mar;27(3):1364-1374. doi: 10.1109/JBHI.2022.3227585. Epub 2023 Mar 7.
8
A method for detecting the quality of cotton seeds based on an improved ResNet50 model.基于改进的 ResNet50 模型的棉花种子质量检测方法。
PLoS One. 2023 Feb 15;18(2):e0273057. doi: 10.1371/journal.pone.0273057. eCollection 2023.
9
Automatic Recognition of Giant Panda Attributes from Their Vocalizations Based on Squeeze-and-Excitation Network.基于挤压激励网络的大熊猫叫声属性自动识别。
Sensors (Basel). 2022 Oct 20;22(20):8015. doi: 10.3390/s22208015.
10
BND-VGG-19: A deep learning algorithm for COVID-19 identification utilizing X-ray images.BND-VGG-19:一种利用X射线图像识别新冠肺炎的深度学习算法。
Knowl Based Syst. 2022 Dec 22;258:110040. doi: 10.1016/j.knosys.2022.110040. Epub 2022 Oct 21.