Suppr超能文献

探索空间声音事件表征的自监督对比学习

EXPLORING SELF-SUPERVISED CONTRASTIVE LEARNING OF SPATIAL SOUND EVENT REPRESENTATION.

作者信息

Jiang Xilin, Han Cong, Li Yinghao Aaron, Mesgarani Nima

机构信息

Department of Electrical Engineering, Columbia University, USA.

出版信息

Proc IEEE Int Conf Acoust Speech Signal Process. 2024 Apr;2024:1281-1285. doi: 10.1109/icassp48485.2024.10447391. Epub 2024 Mar 18.

Abstract

In this study, we present a simple multi-channel framework for contrastive learning (MC-SimCLR) to encode 'what' and 'where' of spatial audios. MC-SimCLR learns joint spectral and spatial representations from unlabeled spatial audios, thereby enhancing both event classification and sound localization in downstream tasks. At its core, we propose a multi-level data augmentation pipeline that augments different levels of audio features, including waveforms, Mel spectrograms, and generalized cross-correlation (GCC) features. In addition, we introduce simple yet effective channel-wise augmentation methods to randomly swap the order of the microphones and mask Mel and GCC channels. By using these augmentations, we find that linear layers on top of the learned representation significantly outperform supervised models in terms of both event classification accuracy and localization error. We also perform a comprehensive analysis of the effect of each augmentation method and a comparison of the fine-tuning performance using different amounts of labeled data.

摘要

在本研究中,我们提出了一种用于对比学习的简单多通道框架(MC-SimCLR),以对空间音频的“内容”和“位置”进行编码。MC-SimCLR从未标记的空间音频中学习联合频谱和空间表示,从而在下游任务中增强事件分类和声音定位。其核心是,我们提出了一种多级数据增强管道,该管道增强不同级别的音频特征,包括波形、梅尔频谱图和广义互相关(GCC)特征。此外,我们引入了简单而有效的逐通道增强方法,以随机交换麦克风的顺序,并对梅尔和GCC通道进行掩码处理。通过使用这些增强方法,我们发现,在事件分类准确率和定位误差方面,基于所学表示之上的线性层显著优于监督模型。我们还对每种增强方法的效果进行了全面分析,并比较了使用不同数量标记数据时的微调性能。

相似文献

1
EXPLORING SELF-SUPERVISED CONTRASTIVE LEARNING OF SPATIAL SOUND EVENT REPRESENTATION.探索空间声音事件表征的自监督对比学习
Proc IEEE Int Conf Acoust Speech Signal Process. 2024 Apr;2024:1281-1285. doi: 10.1109/icassp48485.2024.10447391. Epub 2024 Mar 18.
3
Self-Supervised Contrastive Representation Learning for Semi-Supervised Time-Series Classification.用于半监督时间序列分类的自监督对比表示学习
IEEE Trans Pattern Anal Mach Intell. 2023 Dec;45(12):15604-15618. doi: 10.1109/TPAMI.2023.3308189. Epub 2023 Nov 3.

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验