Suppr超能文献

基于对比自监督预训练的视频质量评估

Contrastive Self-Supervised Pre-Training for Video Quality Assessment.

出版信息

IEEE Trans Image Process. 2022;31:458-471. doi: 10.1109/TIP.2021.3130536. Epub 2021 Dec 16.

Abstract

Video quality assessment (VQA) task is an ongoing small sample learning problem due to the costly effort required for manual annotation. Since existing VQA datasets are of limited scale, prior research tries to leverage models pre-trained on ImageNet to mitigate this kind of shortage. Nonetheless, these well-trained models targeting on image classification task can be sub-optimal when applied on VQA data from a significantly different domain. In this paper, we make the first attempt to perform self-supervised pre-training for VQA task built upon contrastive learning method, targeting at exploiting the plentiful unlabeled video data to learn feature representation in a simple-yet-effective way. Specifically, we implement this idea by first generating distorted video samples with diverse distortion characteristics and visual contents based on the proposed distortion augmentation strategy. Afterwards, we conduct contrastive learning to capture quality-aware information by maximizing the agreement on feature representations of future frames and their corresponding predictions in the embedding space. In addition, we further introduce distortion prediction task as an additional learning objective to push the model towards discriminating different distortion categories of the input video. Solving these prediction tasks jointly with the contrastive learning not only provides stronger surrogate supervision signals, but also learns the shared knowledge among the prediction tasks. Extensive experiments demonstrate that our approach sets a new state-of-the-art in self-supervised learning for VQA task. Our results also underscore that the learned pre-trained model can significantly benefit the existing learning based VQA models. Source code is available at https://github.com/cpf0079/CSPT.

摘要

视频质量评估 (VQA) 任务是一个持续的小样本学习问题,因为手动注释需要花费大量的精力。由于现有的 VQA 数据集规模有限,因此先前的研究尝试利用在 ImageNet 上预训练的模型来缓解这种短缺。然而,这些针对图像分类任务训练有素的模型在应用于来自完全不同领域的 VQA 数据时可能效果不佳。在本文中,我们首次尝试基于对比学习方法对 VQA 任务进行自监督预训练,旨在利用大量未标记的视频数据以简单而有效的方式学习特征表示。具体来说,我们通过首先根据所提出的失真增强策略生成具有不同失真特征和视觉内容的失真视频样本来实现这一想法。之后,我们进行对比学习,通过最大化嵌入空间中未来帧及其相应预测的特征表示之间的一致性来捕获质量感知信息。此外,我们进一步引入失真预测任务作为附加学习目标,以推动模型对输入视频的不同失真类别的区分。联合解决这些预测任务不仅提供了更强的替代监督信号,而且还学习了预测任务之间的共享知识。广泛的实验表明,我们的方法在 VQA 任务的自监督学习方面取得了新的突破。我们的结果还强调,学习到的预训练模型可以显著受益于现有的基于学习的 VQA 模型。我们的代码可以在 https://github.com/cpf0079/CSPT 上找到。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验