基于空间金字塔协方差的紧凑视频编码用于电视剧中鲁棒的人脸检索

Spatial Pyramid Covariance-Based Compact Video Code for Robust Face Retrieval in TV-Series.

出版信息

IEEE Trans Image Process. 2016 Dec;25(12):5905-5919. doi: 10.1109/TIP.2016.2616297. Epub 2016 Oct 10.

DOI:10.1109/TIP.2016.2616297

Abstract

We address the problem of face video retrieval in TV-series, which searches video clips based on the presence of specific character, given one face track of his/her. This is tremendously challenging because on one hand, faces in TV-series are captured in largely uncontrolled conditions with complex appearance variations, and on the other hand, retrieval task typically needs efficient representation with low time and space complexity. To handle this problem, we propose a compact and discriminative representation for the huge body of video data, named compact video code (CVC). Our method first models the face track by its sample (i.e., frame) covariance matrix to capture the video data variations in a statistical manner. To incorporate discriminative information and obtain more compact video signature suitable for retrieval, the high-dimensional covariance representation is further encoded as a much lower dimensional binary vector, which finally yields the proposed CVC. Specifically, each bit of the code, i.e., each dimension of the binary vector, is produced via supervised learning in a max margin framework, which aims to make a balance between the discriminability and stability of the code. Besides, we further extend the descriptive granularity of covariance matrix from traditional pixel-level to more general patch-level, and proceed to propose a novel hierarchical video representation named spatial pyramid covariance along with a fast calculation method. Face retrieval experiments on two challenging TV-series video databases, i.e., the Big Bang Theory and Prison Break, demonstrate the competitiveness of the proposed CVC over the state-of-the-art retrieval methods. In addition, as a general video matching algorithm, CVC is also evaluated in traditional video face recognition task on a standard Internet database, i.e., YouTube Celebrities, showing its quite promising performance by using an extremely compact code with only 128 bits.

摘要

我们研究电视剧中的人脸视频检索问题，即给定某一角色的一条人脸轨迹，基于该特定角色的出现来搜索视频片段。这极具挑战性，一方面，电视剧中的人脸是在很大程度上不受控制的条件下拍摄的，外观变化复杂；另一方面，检索任务通常需要具有低时间和空间复杂度的高效表示。为解决这个问题，我们针对大量视频数据提出了一种紧凑且有区分性的表示方法，称为紧凑视频编码（CVC）。我们的方法首先通过其样本（即帧）协方差矩阵对人脸轨迹进行建模，以统计方式捕捉视频数据的变化。为纳入区分性信息并获得更适合检索的紧凑视频签名，高维协方差表示进一步被编码为维度低得多的二进制向量，最终得到所提出的CVC。具体而言，编码的每一位，即二进制向量的每一维，是通过在最大间隔框架下的监督学习生成的，其目的是在编码的可区分性和稳定性之间取得平衡。此外，我们将协方差矩阵的描述粒度从传统的像素级进一步扩展到更通用的块级，并进而提出一种名为空间金字塔协方差的新颖分层视频表示以及一种快速计算方法。在两个具有挑战性的电视剧视频数据库，即《生活大爆炸》和《越狱》上进行的人脸检索实验，证明了所提出的CVC相对于现有最先进检索方法的竞争力。此外，作为一种通用的视频匹配算法，CVC还在一个标准的互联网数据库，即YouTube名人数据库上的传统视频人脸识别任务中进行了评估，通过使用仅128位的极其紧凑的编码展示了其非常有前景的性能。

相似文献

Spatial Pyramid Covariance-Based Compact Video Code for Robust Face Retrieval in TV-Series.基于空间金字塔协方差的紧凑视频编码用于电视剧中鲁棒的人脸检索

IEEE Trans Image Process. 2016 Dec;25(12):5905-5919. doi: 10.1109/TIP.2016.2616297. Epub 2016 Oct 10.

Face Video Retrieval Based on the Deep CNN With RBF Loss.基于带径向基函数损失的深度卷积神经网络的面部视频检索

IEEE Trans Image Process. 2021;30:1015-1029. doi: 10.1109/TIP.2020.3040847. Epub 2020 Dec 9.

Discriminant Analysis on Riemannian Manifold of Gaussian Distributions for Face Recognition With Image Sets.基于图像集的高斯分布黎曼流形的人脸识别判别分析。

IEEE Trans Image Process. 2018;27(1):151-163. doi: 10.1109/TIP.2017.2746993.

Discriminative Codebook Hashing for Supervised Video Retrieval.基于判别式码本哈希的监督视频检索

Comput Intell Neurosci. 2021 Aug 25;2021:5845094. doi: 10.1155/2021/5845094. eCollection 2021.

Hierarchical Recurrent Neural Hashing for Image Retrieval With Hierarchical Convolutional Features.基于层次卷积特征的层次递归神经网络哈希图像检索

IEEE Trans Image Process. 2018;27(1):106-120. doi: 10.1109/TIP.2017.2755766.

Unsupervised feature disentanglement for video retrieval in minimally invasive surgery.非监督特征解缠用于微创手术中的视频检索。

Med Image Anal. 2022 Jan;75:102296. doi: 10.1016/j.media.2021.102296. Epub 2021 Nov 3.

Self-Supervised Video-Centralised Transformer for Video Face Clustering.用于视频人脸聚类的自监督视频集中式Transformer

IEEE Trans Pattern Anal Mach Intell. 2023 Nov;45(11):12944-12959. doi: 10.1109/TPAMI.2023.3243812. Epub 2023 Oct 3.

Learning Short Binary Codes for Large-scale Image Retrieval.学习用于大规模图像检索的短二进制代码。

IEEE Trans Image Process. 2017 Mar;26(3):1289-1299. doi: 10.1109/TIP.2017.2651390. Epub 2017 Jan 11.

Learning Compact Binary Face Descriptor for Face Recognition.学习紧凑二进制人脸描述符进行人脸识别。

IEEE Trans Pattern Anal Mach Intell. 2015 Oct;37(10):2041-56. doi: 10.1109/TPAMI.2015.2408359.

Self-Supervised Motion Perception for Spatiotemporal Representation Learning.用于时空表征学习的自监督运动感知

IEEE Trans Neural Netw Learn Syst. 2023 Dec;34(12):9832-9846. doi: 10.1109/TNNLS.2022.3160860. Epub 2023 Nov 30.

基于空间金字塔协方差的紧凑视频编码用于电视剧中鲁棒的人脸检索

Spatial Pyramid Covariance-Based Compact Video Code for Robust Face Retrieval in TV-Series.

出版信息

IEEE Trans Image Process. 2016 Dec;25(12):5905-5919. doi: 10.1109/TIP.2016.2616297. Epub 2016 Oct 10.

DOI:10.1109/TIP.2016.2616297

PMID:27740484

Abstract

摘要

Suppr 超能文献

文献检索

文件翻译

深度研究

Suppr 超能文献

文献检索

文件翻译

深度研究

基于空间金字塔协方差的紧凑视频编码用于电视剧中鲁棒的人脸检索

Spatial Pyramid Covariance-Based Compact Video Code for Robust Face Retrieval in TV-Series.

出版信息

相似文献

基于空间金字塔协方差的紧凑视频编码用于电视剧中鲁棒的人脸检索

Spatial Pyramid Covariance-Based Compact Video Code for Robust Face Retrieval in TV-Series.

出版信息

相似文献