• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

基于声学新颖性检测的增量式类学习方法在声学事件识别中的应用。

An Incremental Class-Learning Approach with Acoustic Novelty Detection for Acoustic Event Recognition.

机构信息

Computer Engineering Department, Faculty of Computer and Informatics Engineering, Istanbul Technical University, Istanbul 34469, Turkey.

Artificial Intelligence and Data Science Application and Research Center, Istanbul Technical University, Istanbul 34469, Turkey.

出版信息

Sensors (Basel). 2021 Oct 5;21(19):6622. doi: 10.3390/s21196622.

DOI:10.3390/s21196622
PMID:34640943
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC8512090/
Abstract

Acoustic scene analysis (ASA) relies on the dynamic sensing and understanding of stationary and non-stationary sounds from various events, background noises and human actions with objects. However, the spatio-temporal nature of the sound signals may not be stationary, and novel events may exist that eventually deteriorate the performance of the analysis. In this study, a self-learning-based ASA for acoustic event recognition (AER) is presented to detect and incrementally learn novel acoustic events by tackling catastrophic forgetting. The proposed ASA framework comprises six elements: (1) raw acoustic signal pre-processing, (2) low-level and deep audio feature extraction, (3) acoustic novelty detection (AND), (4) acoustic signal augmentations, (5) incremental class-learning (ICL) (of the audio features of the novel events) and (6) AER. The self-learning on different types of audio features extracted from the acoustic signals of various events occurs without human supervision. For the extraction of deep audio representations, in addition to visual geometry group (VGG) and residual neural network (ResNet), time-delay neural network (TDNN) and TDNN based long short-term memory (TDNN-LSTM) networks are pre-trained using a large-scale audio dataset, Google AudioSet. The performances of ICL with AND using Mel-spectrograms, and deep features with TDNNs, VGG, and ResNet from the Mel-spectrograms are validated on benchmark audio datasets such as ESC-10, ESC-50, UrbanSound8K (US8K), and an audio dataset collected by the authors in a real domestic environment.

摘要

声场景分析(ASA)依赖于对来自各种事件、背景噪声和人类与物体交互的静止和非静止声音的动态感知和理解。然而,声音信号的时变性质可能并不稳定,并且可能存在新的事件,这些事件最终会降低分析的性能。在这项研究中,提出了一种基于自学习的声学事件识别(AER)的 ASA,以通过处理灾难性遗忘来检测和逐步学习新的声学事件。所提出的 ASA 框架包括六个要素:(1)原始声学信号预处理,(2)低级和深度音频特征提取,(3)声学新颖性检测(AND),(4)声学信号增强,(5)增量类学习(ICL)(新事件的音频特征)和(6)AER。不同类型的音频特征的自学习是在没有人为监督的情况下从各种事件的声学信号中进行的。为了提取深度音频表示,除了视觉几何组(VGG)和残差神经网络(ResNet)之外,还使用大规模音频数据集 Google AudioSet 对时滞神经网络(TDNN)和基于 TDNN 的长短期记忆(TDNN-LSTM)网络进行预训练。在 ESC-10、ESC-50、UrbanSound8K(US8K)等基准音频数据集以及作者在真实国内环境中收集的音频数据集中,验证了使用 AND 进行 ICL 的 Mel 频谱图、以及 TDNNs、VGG 和 ResNet 的深度特征的性能。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/71c7/8512090/6a0202eb8a21/sensors-21-06622-g007.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/71c7/8512090/af94ff7dcb36/sensors-21-06622-g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/71c7/8512090/3878301d3963/sensors-21-06622-g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/71c7/8512090/926d7feeec6f/sensors-21-06622-g003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/71c7/8512090/da237f606361/sensors-21-06622-g004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/71c7/8512090/34baceb6bd36/sensors-21-06622-g005.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/71c7/8512090/83cf1fa51e33/sensors-21-06622-g006.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/71c7/8512090/6a0202eb8a21/sensors-21-06622-g007.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/71c7/8512090/af94ff7dcb36/sensors-21-06622-g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/71c7/8512090/3878301d3963/sensors-21-06622-g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/71c7/8512090/926d7feeec6f/sensors-21-06622-g003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/71c7/8512090/da237f606361/sensors-21-06622-g004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/71c7/8512090/34baceb6bd36/sensors-21-06622-g005.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/71c7/8512090/83cf1fa51e33/sensors-21-06622-g006.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/71c7/8512090/6a0202eb8a21/sensors-21-06622-g007.jpg

相似文献

1
An Incremental Class-Learning Approach with Acoustic Novelty Detection for Acoustic Event Recognition.基于声学新颖性检测的增量式类学习方法在声学事件识别中的应用。
Sensors (Basel). 2021 Oct 5;21(19):6622. doi: 10.3390/s21196622.
2
Railway Track Inspection Using Deep Learning Based on Audio to Spectrogram Conversion: An on-the-Fly Approach.基于音频到声谱图转换的深度学习的铁路轨道检测:一种实时方法。
Sensors (Basel). 2022 Mar 3;22(5):1983. doi: 10.3390/s22051983.
3
End-to-end multimodal clinical depression recognition using deep neural networks: A comparative analysis.端到端使用深度神经网络进行多模态临床抑郁症识别:比较分析。
Comput Methods Programs Biomed. 2021 Nov;211:106433. doi: 10.1016/j.cmpb.2021.106433. Epub 2021 Sep 28.
4
Automated detection and recognition system for chewable food items using advanced deep learning models.使用先进深度学习模型的可咀嚼食品自动检测和识别系统。
Sci Rep. 2024 Mar 19;14(1):6589. doi: 10.1038/s41598-024-57077-z.
5
Clustering by Errors: A Self-Organized Multitask Learning Method for Acoustic Scene Classification.基于错误的聚类:用于声场景分类的自组织多任务学习方法。
Sensors (Basel). 2021 Dec 22;22(1):36. doi: 10.3390/s22010036.
6
Audio-Based Drone Detection and Identification Using Deep Learning Techniques with Dataset Enhancement through Generative Adversarial Networks.基于音频的无人机检测与识别:深度学习技术与生成对抗网络增强数据集
Sensors (Basel). 2021 Jul 21;21(15):4953. doi: 10.3390/s21154953.
7
Underwater single-channel acoustic signal multitarget recognition using convolutional neural networks.基于卷积神经网络的水下单通道声信号多目标识别
J Acoust Soc Am. 2022 Mar;151(3):2245. doi: 10.1121/10.0009852.
8
Lung sounds classification using convolutional neural networks.使用卷积神经网络进行肺部声音分类。
Artif Intell Med. 2018 Jun;88:58-69. doi: 10.1016/j.artmed.2018.04.008. Epub 2018 May 1.
9
Deep Learning Methods for Underwater Target Feature Extraction and Recognition.深度学习方法在水下目标特征提取与识别中的应用。
Comput Intell Neurosci. 2018 Mar 27;2018:1214301. doi: 10.1155/2018/1214301. eCollection 2018.
10
Acoustic scene classification based on three-dimensional multi-channel feature-correlated deep learning networks.基于三维多通道特征相关深度学习网络的声场景分类。
Sci Rep. 2022 Aug 12;12(1):13730. doi: 10.1038/s41598-022-17863-z.

引用本文的文献

1
Incremental Learning of Human Activities in Smart Homes.智能家居中人类活动的增量学习。
Sensors (Basel). 2022 Nov 3;22(21):8458. doi: 10.3390/s22218458.
2
Few-Shot Emergency Siren Detection.突发紧急警笛检测。
Sensors (Basel). 2022 Jun 8;22(12):4338. doi: 10.3390/s22124338.

本文引用的文献

1
Self-Training for Class-Incremental Semantic Segmentation.用于类别增量语义分割的自训练
IEEE Trans Neural Netw Learn Syst. 2023 Nov;34(11):9116-9127. doi: 10.1109/TNNLS.2022.3155746. Epub 2023 Oct 27.
2
Memory-Efficient Class-Incremental Learning for Image Classification.用于图像分类的内存高效类增量学习
IEEE Trans Neural Netw Learn Syst. 2022 Oct;33(10):5966-5977. doi: 10.1109/TNNLS.2021.3072041. Epub 2022 Oct 5.
3
Joint Optimization of Deep Neural Network-Based Dereverberation and Beamforming for Sound Event Detection in Multi-Channel Environments.
基于深度神经网络的混响消除和波束形成在多通道环境下的声音事件检测的联合优化。
Sensors (Basel). 2020 Mar 28;20(7):1883. doi: 10.3390/s20071883.
4
Adaptive Noise Reduction for Sound Event Detection Using Subband-Weighted NMF.基于子带加权非负矩阵分解的声音事件检测自适应降噪方法
Sensors (Basel). 2019 Jul 20;19(14):3206. doi: 10.3390/s19143206.
5
Continuous learning in single-incremental-task scenarios.单增量任务场景中的持续学习。
Neural Netw. 2019 Aug;116:56-73. doi: 10.1016/j.neunet.2019.03.010. Epub 2019 Apr 5.
6
Learning without Forgetting.学过不忘。
IEEE Trans Pattern Anal Mach Intell. 2018 Dec;40(12):2935-2947. doi: 10.1109/TPAMI.2017.2773081. Epub 2017 Nov 14.
7
Deep Recurrent Neural Network-Based Autoencoders for Acoustic Novelty Detection.基于深度递归神经网络的声异常检测自编码器。
Comput Intell Neurosci. 2017;2017:4694860. doi: 10.1155/2017/4694860. Epub 2017 Jan 15.
8
Acoustic fall detection using one-class classifiers.使用单类分类器的声学跌倒检测。
Annu Int Conf IEEE Eng Med Biol Soc. 2009;2009:3505-8. doi: 10.1109/IEMBS.2009.5334521.
9
Memory retention--the synaptic stability versus plasticity dilemma.记忆保持——突触稳定性与可塑性的两难困境。
Trends Neurosci. 2005 Feb;28(2):73-8. doi: 10.1016/j.tins.2004.12.003.