Suppr超能文献

基于声学新颖性检测的增量式类学习方法在声学事件识别中的应用。

An Incremental Class-Learning Approach with Acoustic Novelty Detection for Acoustic Event Recognition.

机构信息

Computer Engineering Department, Faculty of Computer and Informatics Engineering, Istanbul Technical University, Istanbul 34469, Turkey.

Artificial Intelligence and Data Science Application and Research Center, Istanbul Technical University, Istanbul 34469, Turkey.

出版信息

Sensors (Basel). 2021 Oct 5;21(19):6622. doi: 10.3390/s21196622.

Abstract

Acoustic scene analysis (ASA) relies on the dynamic sensing and understanding of stationary and non-stationary sounds from various events, background noises and human actions with objects. However, the spatio-temporal nature of the sound signals may not be stationary, and novel events may exist that eventually deteriorate the performance of the analysis. In this study, a self-learning-based ASA for acoustic event recognition (AER) is presented to detect and incrementally learn novel acoustic events by tackling catastrophic forgetting. The proposed ASA framework comprises six elements: (1) raw acoustic signal pre-processing, (2) low-level and deep audio feature extraction, (3) acoustic novelty detection (AND), (4) acoustic signal augmentations, (5) incremental class-learning (ICL) (of the audio features of the novel events) and (6) AER. The self-learning on different types of audio features extracted from the acoustic signals of various events occurs without human supervision. For the extraction of deep audio representations, in addition to visual geometry group (VGG) and residual neural network (ResNet), time-delay neural network (TDNN) and TDNN based long short-term memory (TDNN-LSTM) networks are pre-trained using a large-scale audio dataset, Google AudioSet. The performances of ICL with AND using Mel-spectrograms, and deep features with TDNNs, VGG, and ResNet from the Mel-spectrograms are validated on benchmark audio datasets such as ESC-10, ESC-50, UrbanSound8K (US8K), and an audio dataset collected by the authors in a real domestic environment.

摘要

声场景分析(ASA)依赖于对来自各种事件、背景噪声和人类与物体交互的静止和非静止声音的动态感知和理解。然而,声音信号的时变性质可能并不稳定,并且可能存在新的事件,这些事件最终会降低分析的性能。在这项研究中,提出了一种基于自学习的声学事件识别(AER)的 ASA,以通过处理灾难性遗忘来检测和逐步学习新的声学事件。所提出的 ASA 框架包括六个要素:(1)原始声学信号预处理,(2)低级和深度音频特征提取,(3)声学新颖性检测(AND),(4)声学信号增强,(5)增量类学习(ICL)(新事件的音频特征)和(6)AER。不同类型的音频特征的自学习是在没有人为监督的情况下从各种事件的声学信号中进行的。为了提取深度音频表示,除了视觉几何组(VGG)和残差神经网络(ResNet)之外,还使用大规模音频数据集 Google AudioSet 对时滞神经网络(TDNN)和基于 TDNN 的长短期记忆(TDNN-LSTM)网络进行预训练。在 ESC-10、ESC-50、UrbanSound8K(US8K)等基准音频数据集以及作者在真实国内环境中收集的音频数据集中,验证了使用 AND 进行 ICL 的 Mel 频谱图、以及 TDNNs、VGG 和 ResNet 的深度特征的性能。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/71c7/8512090/af94ff7dcb36/sensors-21-06622-g001.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验