通过视频非相干检测进行自监督视频表征学习

Self-Supervised Video Representation Learning by Video Incoherence Detection.

作者信息

Cao Haozhi, Xu Yuecong, Mao Kezhi, Xie Lihua, Yin Jianxiong, See Simon, Xu Qianwen, Yang Jianfei

出版信息

IEEE Trans Cybern. 2024 Jun;54(6):3810-3822. doi: 10.1109/TCYB.2023.3265393. Epub 2024 May 30.

DOI:10.1109/TCYB.2023.3265393

Abstract

This article introduces a novel self-supervised method that leverages incoherence detection for video representation learning. It stems from the observation that the visual system of human beings can easily identify video incoherence based on their comprehensive understanding of videos. Specifically, we construct the incoherent clip by multiple subclips hierarchically sampled from the same raw video with various lengths of incoherence. The network is trained to learn the high-level representation by predicting the location and length of incoherence given the incoherent clip as input. Additionally, we introduce intravideo contrastive learning to maximize the mutual information between incoherent clips from the same raw video. We evaluate our proposed method through extensive experiments on action recognition and video retrieval using various backbone networks. Experiments show that our proposed method achieves remarkable performance across different backbone networks and different datasets compared to previous coherence-based methods.

摘要

本文介绍了一种新颖的自监督方法，该方法利用非相干检测进行视频表示学习。它源于这样一种观察，即人类视觉系统能够基于对视频的全面理解轻松识别视频中的非相干性。具体而言，我们通过从同一原始视频中分层采样多个具有不同非相干长度的子剪辑来构建非相干剪辑。网络通过将非相干剪辑作为输入预测非相干的位置和长度来学习高级表示。此外，我们引入了视频内对比学习，以最大化来自同一原始视频的非相干剪辑之间的互信息。我们使用各种骨干网络在动作识别和视频检索方面进行了广泛的实验来评估我们提出的方法。实验表明，与以前基于相干性的方法相比，我们提出的方法在不同的骨干网络和不同的数据集上都取得了显著的性能。

相似文献

Self-Supervised Video Representation Learning by Video Incoherence Detection.通过视频非相干检测进行自监督视频表征学习

IEEE Trans Cybern. 2024 Jun;54(6):3810-3822. doi: 10.1109/TCYB.2023.3265393. Epub 2024 May 30.

Cross-view motion consistent self-supervised video inter-intra contrastive for action representation understanding.跨视图运动一致的自我监督视频内-外对比动作表示理解。

Neural Netw. 2024 Nov;179:106578. doi: 10.1016/j.neunet.2024.106578. Epub 2024 Jul 26.

DANet: Semi-supervised differentiated auxiliaries guided network for video action recognition.DANet：用于视频动作识别的半监督差异化辅助引导网络。

Neural Netw. 2023 Jan;158:121-131. doi: 10.1016/j.neunet.2022.11.009. Epub 2022 Nov 17.

TCGL: Temporal Contrastive Graph for Self-Supervised Video Representation Learning.TCGL：用于自监督视频表征学习的时间对比图

IEEE Trans Image Process. 2022;31:1978-1993. doi: 10.1109/TIP.2022.3147032. Epub 2022 Feb 18.

Contrastive Learning of Person-Independent Representations for Facial Action Unit Detection.基于人脸动作单元检测的无监督个体身份表示对比学习。

IEEE Trans Image Process. 2023;32:3212-3225. doi: 10.1109/TIP.2023.3279978. Epub 2023 Jun 7.

Self-Supervised Video Representation Learning by Uncovering Spatio-Temporal Statistics.自监督视频表示学习：揭示时空统计信息。

IEEE Trans Pattern Anal Mach Intell. 2022 Jul;44(7):3791-3806. doi: 10.1109/TPAMI.2021.3057833. Epub 2022 Jun 3.

Self-Supervised Video-Based Action Recognition With Disturbances.基于自监督视频的干扰动作识别。

IEEE Trans Image Process. 2023;32:2493-2507. doi: 10.1109/TIP.2023.3269228. Epub 2023 May 5.

Boundary-aware information maximization for self-supervised medical image segmentation.用于自监督医学图像分割的边界感知信息最大化

Med Image Anal. 2024 May;94:103150. doi: 10.1016/j.media.2024.103150. Epub 2024 Mar 28.

Self-supervised Contrastive Video-Speech Representation Learning for Ultrasound.用于超声的自监督对比视频-语音表征学习

Med Image Comput Comput Assist Interv. 2020 Oct;12263:534-543. doi: 10.1007/978-3-030-59716-0_51.

Self-Supervised Representation Learning for Ultrasound Video.超声视频的自监督表征学习

Proc IEEE Int Symp Biomed Imaging. 2020 Apr 3;2020:1847-1850. doi: 10.1109/ISBI45749.2020.9098666.

引用本文的文献

Adaptive temporal compression for reduction of computational complexity in human behavior recognition.自适应时间压缩在人类行为识别中降低计算复杂度的应用。

Sci Rep. 2024 May 8;14(1):10560. doi: 10.1038/s41598-024-61286-x.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

通过视频非相干检测进行自监督视频表征学习

Self-Supervised Video Representation Learning by Video Incoherence Detection.

作者信息

出版信息

相似文献

引用本文的文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献

引用本文的文献