探索空间声音事件表征的自监督对比学习

EXPLORING SELF-SUPERVISED CONTRASTIVE LEARNING OF SPATIAL SOUND EVENT REPRESENTATION.

作者信息

Jiang Xilin, Han Cong, Li Yinghao Aaron, Mesgarani Nima

机构信息

Department of Electrical Engineering, Columbia University, USA.

出版信息

Proc IEEE Int Conf Acoust Speech Signal Process. 2024 Apr;2024:1281-1285. doi: 10.1109/icassp48485.2024.10447391. Epub 2024 Mar 18.

DOI:10.1109/icassp48485.2024.10447391

PMID:39049981

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC11268432/

Abstract

In this study, we present a simple multi-channel framework for contrastive learning (MC-SimCLR) to encode 'what' and 'where' of spatial audios. MC-SimCLR learns joint spectral and spatial representations from unlabeled spatial audios, thereby enhancing both event classification and sound localization in downstream tasks. At its core, we propose a multi-level data augmentation pipeline that augments different levels of audio features, including waveforms, Mel spectrograms, and generalized cross-correlation (GCC) features. In addition, we introduce simple yet effective channel-wise augmentation methods to randomly swap the order of the microphones and mask Mel and GCC channels. By using these augmentations, we find that linear layers on top of the learned representation significantly outperform supervised models in terms of both event classification accuracy and localization error. We also perform a comprehensive analysis of the effect of each augmentation method and a comparison of the fine-tuning performance using different amounts of labeled data.

摘要

在本研究中，我们提出了一种用于对比学习的简单多通道框架（MC-SimCLR），以对空间音频的“内容”和“位置”进行编码。MC-SimCLR从未标记的空间音频中学习联合频谱和空间表示，从而在下游任务中增强事件分类和声音定位。其核心是，我们提出了一种多级数据增强管道，该管道增强不同级别的音频特征，包括波形、梅尔频谱图和广义互相关（GCC）特征。此外，我们引入了简单而有效的逐通道增强方法，以随机交换麦克风的顺序，并对梅尔和GCC通道进行掩码处理。通过使用这些增强方法，我们发现，在事件分类准确率和定位误差方面，基于所学表示之上的线性层显著优于监督模型。我们还对每种增强方法的效果进行了全面分析，并比较了使用不同数量标记数据时的微调性能。

相似文献

EXPLORING SELF-SUPERVISED CONTRASTIVE LEARNING OF SPATIAL SOUND EVENT REPRESENTATION.探索空间声音事件表征的自监督对比学习

Proc IEEE Int Conf Acoust Speech Signal Process. 2024 Apr;2024:1281-1285. doi: 10.1109/icassp48485.2024.10447391. Epub 2024 Mar 18.

Local contrastive loss with pseudo-label based self-training for semi-supervised medical image segmentation.基于伪标签自训练的局部对比损失的半监督医学图像分割。

Med Image Anal. 2023 Jul;87:102792. doi: 10.1016/j.media.2023.102792. Epub 2023 Mar 11.

Self-Supervised Contrastive Representation Learning for Semi-Supervised Time-Series Classification.用于半监督时间序列分类的自监督对比表示学习

IEEE Trans Pattern Anal Mach Intell. 2023 Dec;45(12):15604-15618. doi: 10.1109/TPAMI.2023.3308189. Epub 2023 Nov 3.

Weakly-supervised learning-based pathology detection and localization in 3D chest CT scans.基于弱监督学习的三维胸部 CT 扫描中的病理学检测和定位。

Med Phys. 2024 Nov;51(11):8272-8282. doi: 10.1002/mp.17302. Epub 2024 Aug 14.

Self-Supervised Action Representation Learning Based on Asymmetric Skeleton Data Augmentation.基于非对称骨骼数据增强的自监督动作表示学习。

Sensors (Basel). 2022 Nov 20;22(22):8989. doi: 10.3390/s22228989.

X-Invariant Contrastive Augmentation and Representation Learning for Semi-Supervised Skeleton-Based Action Recognition.用于基于骨架的半监督动作识别的X不变对比增强与表示学习

IEEE Trans Image Process. 2022;31:3852-3867. doi: 10.1109/TIP.2022.3175605. Epub 2022 Jun 2.

Boundary-aware information maximization for self-supervised medical image segmentation.用于自监督医学图像分割的边界感知信息最大化

Med Image Anal. 2024 May;94:103150. doi: 10.1016/j.media.2024.103150. Epub 2024 Mar 28.

Reducing annotation burden in MR: A novel MR-contrast guided contrastive learning approach for image segmentation.减少磁共振成像中的标注负担：一种新的基于磁共振对比引导的对比学习方法用于图像分割。

Med Phys. 2024 Apr;51(4):2707-2720. doi: 10.1002/mp.16820. Epub 2023 Nov 13.

Cross-view motion consistent self-supervised video inter-intra contrastive for action representation understanding.跨视图运动一致的自我监督视频内-外对比动作表示理解。

Neural Netw. 2024 Nov;179:106578. doi: 10.1016/j.neunet.2024.106578. Epub 2024 Jul 26.

AFSC: A self-supervised augmentation-free spatial clustering method based on contrastive learning for identifying spatial domains.AFSC：一种基于对比学习的无自监督增强空间聚类方法，用于识别空间域。

Comput Struct Biotechnol J. 2024 Sep 10;23:3358-3367. doi: 10.1016/j.csbj.2024.09.005. eCollection 2024 Dec.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。