基于对比自监督预训练的视频质量评估

IEEE Trans Image Process. 2022;31:458-471. doi: 10.1109/TIP.2021.3130536. Epub 2021 Dec 16.

Video quality assessment (VQA) task is an ongoing small sample learning problem due to the costly effort required for manual annotation. Since existing VQA datasets are of limited scale, prior research tries to leverage models pre-trained on ImageNet to mitigate this kind of shortage. Nonetheless, these well-trained models targeting on image classification task can be sub-optimal when applied on VQA data from a significantly different domain. In this paper, we make the first attempt to perform self-supervised pre-training for VQA task built upon contrastive learning method, targeting at exploiting the plentiful unlabeled video data to learn feature representation in a simple-yet-effective way. Specifically, we implement this idea by first generating distorted video samples with diverse distortion characteristics and visual contents based on the proposed distortion augmentation strategy. Afterwards, we conduct contrastive learning to capture quality-aware information by maximizing the agreement on feature representations of future frames and their corresponding predictions in the embedding space. In addition, we further introduce distortion prediction task as an additional learning objective to push the model towards discriminating different distortion categories of the input video. Solving these prediction tasks jointly with the contrastive learning not only provides stronger surrogate supervision signals, but also learns the shared knowledge among the prediction tasks. Extensive experiments demonstrate that our approach sets a new state-of-the-art in self-supervised learning for VQA task. Our results also underscore that the learned pre-trained model can significantly benefit the existing learning based VQA models. Source code is available at https://github.com/cpf0079/CSPT.

视频质量评估 (VQA) 任务是一个持续的小样本学习问题，因为手动注释需要花费大量的精力。由于现有的 VQA 数据集规模有限，因此先前的研究尝试利用在 ImageNet 上预训练的模型来缓解这种短缺。然而，这些针对图像分类任务训练有素的模型在应用于来自完全不同领域的 VQA 数据时可能效果不佳。在本文中，我们首次尝试基于对比学习方法对 VQA 任务进行自监督预训练，旨在利用大量未标记的视频数据以简单而有效的方式学习特征表示。具体来说，我们通过首先根据所提出的失真增强策略生成具有不同失真特征和视觉内容的失真视频样本来实现这一想法。之后，我们进行对比学习，通过最大化嵌入空间中未来帧及其相应预测的特征表示之间的一致性来捕获质量感知信息。此外，我们进一步引入失真预测任务作为附加学习目标，以推动模型对输入视频的不同失真类别的区分。联合解决这些预测任务不仅提供了更强的替代监督信号，而且还学习了预测任务之间的共享知识。广泛的实验表明，我们的方法在 VQA 任务的自监督学习方面取得了新的突破。我们的结果还强调，学习到的预训练模型可以显著受益于现有的基于学习的 VQA 模型。我们的代码可以在 https://github.com/cpf0079/CSPT 上找到。

相似文献

Contrastive Self-Supervised Pre-Training for Video Quality Assessment.

IEEE Trans Image Process. 2022;31:458-471. doi: 10.1109/TIP.2021.3130536. Epub 2021 Dec 16.

Local contrastive loss with pseudo-label based self-training for semi-supervised medical image segmentation.

Med Image Anal. 2023 Jul;87:102792. doi: 10.1016/j.media.2023.102792. Epub 2023 Mar 11.

TCGL: Temporal Contrastive Graph for Self-Supervised Video Representation Learning.

IEEE Trans Image Process. 2022;31:1978-1993. doi: 10.1109/TIP.2022.3147032. Epub 2022 Feb 18.

CONVIQT: Contrastive Video Quality Estimator.

IEEE Trans Image Process. 2023;32:5138-5152. doi: 10.1109/TIP.2023.3310344. Epub 2023 Sep 15.

Medical Visual Question Answering via Conditional Reasoning and Contrastive Learning.

IEEE Trans Med Imaging. 2023 May;42(5):1532-1545. doi: 10.1109/TMI.2022.3232411. Epub 2023 May 2.

Anatomy-Aware Contrastive Representation Learning for Fetal Ultrasound.

Comput Vis ECCV. 2022 Oct;2022:422-436. doi: 10.1007/978-3-031-25066-8_23.

Image Quality Assessment Using Contrastive Learning.

IEEE Trans Image Process. 2022;31:4149-4161. doi: 10.1109/TIP.2022.3181496. Epub 2022 Jun 20.

Survey on Self-Supervised Learning: Auxiliary Pretext Tasks and Contrastive Learning Methods in Imaging.

Entropy (Basel). 2022 Apr 14;24(4):551. doi: 10.3390/e24040551.

Self-supervised contrastive graph representation with node and graph augmentation.

Neural Netw. 2023 Oct;167:223-232. doi: 10.1016/j.neunet.2023.08.039. Epub 2023 Aug 24.

Self-Supervised Contrastive Representation Learning for Semi-Supervised Time-Series Classification.

IEEE Trans Pattern Anal Mach Intell. 2023 Dec;45(12):15604-15618. doi: 10.1109/TPAMI.2023.3308189. Epub 2023 Nov 3.

引用本文的文献

Deep Learning-Based Synthesized View Quality Enhancement with DIBR Distortion Mask Prediction Using Synthetic Images.

Sensors (Basel). 2022 Oct 24;22(21):8127. doi: 10.3390/s22218127.

No-Reference Video Quality Assessment Using Multi-Pooled, Saliency Weighted Deep Features and Decision Fusion.

Sensors (Basel). 2022 Mar 12;22(6):2209. doi: 10.3390/s22062209.

Suppr 超能文献

核心技术专利：CN118964589B侵权必究

相似文献

Contrastive Self-Supervised Pre-Training for Video Quality Assessment.

IEEE Trans Image Process. 2022;31:458-471. doi: 10.1109/TIP.2021.3130536. Epub 2021 Dec 16.

Local contrastive loss with pseudo-label based self-training for semi-supervised medical image segmentation.

Med Image Anal. 2023 Jul;87:102792. doi: 10.1016/j.media.2023.102792. Epub 2023 Mar 11.

TCGL: Temporal Contrastive Graph for Self-Supervised Video Representation Learning.

IEEE Trans Image Process. 2022;31:1978-1993. doi: 10.1109/TIP.2022.3147032. Epub 2022 Feb 18.

CONVIQT: Contrastive Video Quality Estimator.

IEEE Trans Image Process. 2023;32:5138-5152. doi: 10.1109/TIP.2023.3310344. Epub 2023 Sep 15.

Medical Visual Question Answering via Conditional Reasoning and Contrastive Learning.

IEEE Trans Med Imaging. 2023 May;42(5):1532-1545. doi: 10.1109/TMI.2022.3232411. Epub 2023 May 2.

Anatomy-Aware Contrastive Representation Learning for Fetal Ultrasound.

Comput Vis ECCV. 2022 Oct;2022:422-436. doi: 10.1007/978-3-031-25066-8_23.

Image Quality Assessment Using Contrastive Learning.

IEEE Trans Image Process. 2022;31:4149-4161. doi: 10.1109/TIP.2022.3181496. Epub 2022 Jun 20.

Survey on Self-Supervised Learning: Auxiliary Pretext Tasks and Contrastive Learning Methods in Imaging.

Entropy (Basel). 2022 Apr 14;24(4):551. doi: 10.3390/e24040551.

Self-supervised contrastive graph representation with node and graph augmentation.

Neural Netw. 2023 Oct;167:223-232. doi: 10.1016/j.neunet.2023.08.039. Epub 2023 Aug 24.

Self-Supervised Contrastive Representation Learning for Semi-Supervised Time-Series Classification.

IEEE Trans Pattern Anal Mach Intell. 2023 Dec;45(12):15604-15618. doi: 10.1109/TPAMI.2023.3308189. Epub 2023 Nov 3.

引用本文的文献

Deep Learning-Based Synthesized View Quality Enhancement with DIBR Distortion Mask Prediction Using Synthetic Images.

Sensors (Basel). 2022 Oct 24;22(21):8127. doi: 10.3390/s22218127.

No-Reference Video Quality Assessment Using Multi-Pooled, Saliency Weighted Deep Features and Decision Fusion.

Sensors (Basel). 2022 Mar 12;22(6):2209. doi: 10.3390/s22062209.

Contrastive Self-Supervised Pre-Training for Video Quality Assessment.

出版信息

相似文献

引用本文的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

相似文献

引用本文的文献