Suppr超能文献

自监督视频表示学习:揭示时空统计信息。

Self-Supervised Video Representation Learning by Uncovering Spatio-Temporal Statistics.

出版信息

IEEE Trans Pattern Anal Mach Intell. 2022 Jul;44(7):3791-3806. doi: 10.1109/TPAMI.2021.3057833. Epub 2022 Jun 3.

Abstract

This paper proposes a novel pretext task to address the self-supervised video representation learning problem. Specifically, given an unlabeled video clip, we compute a series of spatio-temporal statistical summaries, such as the spatial location and dominant direction of the largest motion, the spatial location and dominant color of the largest color diversity along the temporal axis, etc. Then a neural network is built and trained to yield the statistical summaries given the video frames as inputs. In order to alleviate the learning difficulty, we employ several spatial partitioning patterns to encode rough spatial locations instead of exact spatial Cartesian coordinates. Our approach is inspired by the observation that human visual system is sensitive to rapidly changing contents in the visual field, and only needs impressions about rough spatial locations to understand the visual contents. To validate the effectiveness of the proposed approach, we conduct extensive experiments with four 3D backbone networks, i.e., C3D, 3D-ResNet, R(2+1)D and S3D-G. The results show that our approach outperforms the existing approaches across these backbone networks on four downstream video analysis tasks including action recognition, video retrieval, dynamic scene recognition, and action similarity labeling. The source code is publicly available at: https://github.com/laura-wang/video_repres_sts.

摘要

本文提出了一种新颖的预训练任务来解决自监督视频表示学习问题。具体来说,给定一个未标记的视频片段,我们计算一系列时空统计摘要,例如最大运动的空间位置和主导方向、沿时间轴的最大颜色多样性的空间位置和主导颜色等。然后构建并训练神经网络,以便根据视频帧作为输入生成统计摘要。为了减轻学习难度,我们采用了几种空间分区模式来编码粗略的空间位置,而不是精确的笛卡尔空间坐标。我们的方法受到以下观察结果的启发:人类视觉系统对视野中快速变化的内容很敏感,只需要对粗略的空间位置有印象就可以理解视觉内容。为了验证所提出方法的有效性,我们使用四个 3D 骨干网络(即 C3D、3D-ResNet、R(2+1)D 和 S3D-G)进行了广泛的实验。结果表明,我们的方法在四个下游视频分析任务(包括动作识别、视频检索、动态场景识别和动作相似性标记)上优于这些骨干网络的现有方法。源代码可在 https://github.com/laura-wang/video_repres_sts 上获得。

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验