Suppr超能文献

对比正样本沿着视听事件线的传播。

Contrastive Positive Sample Propagation Along the Audio-Visual Event Line.

出版信息

IEEE Trans Pattern Anal Mach Intell. 2023 Jun;45(6):7239-7257. doi: 10.1109/TPAMI.2022.3223688. Epub 2023 May 5.

Abstract

Visual and audio signals often coexist in natural environments, forming audio-visual events (AVEs). Given a video, we aim to localize video segments containing an AVE and identify its category. It is pivotal to learn the discriminative features for each video segment. Unlike existing work focusing on audio-visual feature fusion, in this paper, we propose a new contrastive positive sample propagation (CPSP) method for better deep feature representation learning. The contribution of CPSP is to introduce the available full or weak label as a prior that constructs the exact positive-negative samples for contrastive learning. Specifically, the CPSP involves comprehensive contrastive constraints: pair-level positive sample propagation (PSP), segment-level and video-level positive sample activation (PSA and PSA ). Three new contrastive objectives are proposed (i.e., [Formula: see text], [Formula: see text], and [Formula: see text]) and introduced into both the fully and weakly supervised AVE localization. To draw a complete picture of the contrastive learning in AVE localization, we also study the self-supervised positive sample propagation (SSPSP). As a result, CPSP is more helpful to obtain the refined audio-visual features that are distinguishable from the negatives, thus benefiting the classifier prediction. Extensive experiments on the AVE and the newly collected VGGSound-AVEL100k datasets verify the effectiveness and generalization ability of our method.

摘要

视觉和音频信号在自然环境中经常共存,形成视听事件(AVEs)。给定一个视频,我们的目标是定位包含 AVE 的视频片段并识别其类别。学习每个视频片段的有区别的特征是至关重要的。与现有的专注于视听特征融合的工作不同,在本文中,我们提出了一种新的对比正样本传播(CPSP)方法,用于更好地进行深度特征表示学习。CPSP 的贡献在于引入可用的全或弱标签作为构建对比学习的确切正-负样本的先验。具体来说,CPSP 涉及全面的对比约束:对级正样本传播(PSP)、段级和视频级正样本激活(PSA 和 PSA )。提出了三个新的对比目标(即 [Formula: see text]、[Formula: see text] 和 [Formula: see text]),并将其引入到完全和弱监督的 AVE 定位中。为了全面了解 AVE 定位中的对比学习,我们还研究了自监督正样本传播(SSPSP)。结果表明,CPSP 更有助于获得可与负样本区分开的精细化视听特征,从而有利于分类器预测。在 AVE 和新收集的 VGGSound-AVEL100k 数据集上的广泛实验验证了我们方法的有效性和泛化能力。

相似文献

1
Contrastive Positive Sample Propagation Along the Audio-Visual Event Line.
IEEE Trans Pattern Anal Mach Intell. 2023 Jun;45(6):7239-7257. doi: 10.1109/TPAMI.2022.3223688. Epub 2023 May 5.
2
Enhancing Sound Source Localization via False Negative Elimination.
IEEE Trans Pattern Anal Mach Intell. 2024 Dec;46(12):10499-10514. doi: 10.1109/TPAMI.2024.3444029. Epub 2024 Nov 6.
3
Robust Audio-Visual Contrastive Learning for Proposal-Based Self-Supervised Sound Source Localization in Videos.
IEEE Trans Pattern Anal Mach Intell. 2024 Jul;46(7):4896-4907. doi: 10.1109/TPAMI.2024.3363508. Epub 2024 Jun 5.
4
Propagation Structure Fusion for Rumor Detection Based on Node-Level Contrastive Learning.
IEEE Trans Neural Netw Learn Syst. 2024 Dec;35(12):18649-18660. doi: 10.1109/TNNLS.2023.3319661. Epub 2024 Dec 2.
5
Self-supervised Contrastive Video-Speech Representation Learning for Ultrasound.
Med Image Comput Comput Assist Interv. 2020 Oct;12263:534-543. doi: 10.1007/978-3-030-59716-0_51.
6
Contrastive Self-Supervised Pre-Training for Video Quality Assessment.
IEEE Trans Image Process. 2022;31:458-471. doi: 10.1109/TIP.2021.3130536. Epub 2021 Dec 16.
7
SCEHR: Supervised Contrastive Learning for Clinical Risk Prediction using Electronic Health Records.
Proc IEEE Int Conf Data Min. 2021 Dec;2021:857-866. doi: 10.1109/icdm51629.2021.00097.
8
Semantic and Relation Modulation for Audio-Visual Event Localization.
IEEE Trans Pattern Anal Mach Intell. 2023 Jun;45(6):7711-7725. doi: 10.1109/TPAMI.2022.3226328. Epub 2023 May 5.
9
Unsupervised Modality-Transferable Video Highlight Detection With Representation Activation Sequence Learning.
IEEE Trans Image Process. 2024;33:1911-1922. doi: 10.1109/TIP.2024.3372469. Epub 2024 Mar 12.
10
Description-Enhanced Label Embedding Contrastive Learning for Text Classification.
IEEE Trans Neural Netw Learn Syst. 2024 Oct;35(10):14889-14902. doi: 10.1109/TNNLS.2023.3282020. Epub 2024 Oct 7.

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验