Ronin Institute, Montclair, NJ 07043, USA.
Sensors (Basel). 2022 Mar 12;22(6):2209. doi: 10.3390/s22062209.
With the constantly growing popularity of video-based services and applications, no-reference video quality assessment (NR-VQA) has become a very hot research topic. Over the years, many different approaches have been introduced in the literature to evaluate the perceptual quality of digital videos. Due to the advent of large benchmark video quality assessment databases, deep learning has attracted a significant amount of attention in this field in recent years. This paper presents a novel, innovative deep learning-based approach for NR-VQA that relies on a set of in parallel pre-trained convolutional neural networks (CNN) to characterize versatitely the potential image and video distortions. Specifically, temporally pooled and saliency weighted video-level deep features are extracted with the help of a set of pre-trained CNNs and mapped onto perceptual quality scores independently from each other. Finally, the quality scores coming from the different regressors are fused together to obtain the perceptual quality of a given video sequence. Extensive experiments demonstrate that the proposed method sets a new state-of-the-art on two large benchmark video quality assessment databases with authentic distortions. Moreover, the presented results underline that the decision fusion of multiple deep architectures can significantly benefit NR-VQA.
随着基于视频的服务和应用的不断普及,无参考视频质量评估(NR-VQA)已成为一个非常热门的研究课题。多年来,文献中已经提出了许多不同的方法来评估数字视频的感知质量。由于大型基准视频质量评估数据库的出现,近年来深度学习在该领域引起了广泛关注。本文提出了一种新颖的基于深度学习的 NR-VQA 方法,该方法依赖于一组并行预训练的卷积神经网络(CNN),以灵活地描述潜在的图像和视频失真。具体来说,在一组预训练 CNN 的帮助下提取时间池化和显著加权的视频级深度特征,并分别将其映射到感知质量分数上。最后,来自不同回归器的质量分数融合在一起,以获得给定视频序列的感知质量。广泛的实验表明,该方法在具有真实失真的两个大型基准视频质量评估数据库上设定了新的技术水平。此外,所呈现的结果强调了多个深度架构的决策融合可以显著受益于 NR-VQA。