Suppr超能文献

基于空间金字塔协方差的紧凑视频编码用于电视剧中鲁棒的人脸检索

Spatial Pyramid Covariance-Based Compact Video Code for Robust Face Retrieval in TV-Series.

出版信息

IEEE Trans Image Process. 2016 Dec;25(12):5905-5919. doi: 10.1109/TIP.2016.2616297. Epub 2016 Oct 10.

Abstract

We address the problem of face video retrieval in TV-series, which searches video clips based on the presence of specific character, given one face track of his/her. This is tremendously challenging because on one hand, faces in TV-series are captured in largely uncontrolled conditions with complex appearance variations, and on the other hand, retrieval task typically needs efficient representation with low time and space complexity. To handle this problem, we propose a compact and discriminative representation for the huge body of video data, named compact video code (CVC). Our method first models the face track by its sample (i.e., frame) covariance matrix to capture the video data variations in a statistical manner. To incorporate discriminative information and obtain more compact video signature suitable for retrieval, the high-dimensional covariance representation is further encoded as a much lower dimensional binary vector, which finally yields the proposed CVC. Specifically, each bit of the code, i.e., each dimension of the binary vector, is produced via supervised learning in a max margin framework, which aims to make a balance between the discriminability and stability of the code. Besides, we further extend the descriptive granularity of covariance matrix from traditional pixel-level to more general patch-level, and proceed to propose a novel hierarchical video representation named spatial pyramid covariance along with a fast calculation method. Face retrieval experiments on two challenging TV-series video databases, i.e., the Big Bang Theory and Prison Break, demonstrate the competitiveness of the proposed CVC over the state-of-the-art retrieval methods. In addition, as a general video matching algorithm, CVC is also evaluated in traditional video face recognition task on a standard Internet database, i.e., YouTube Celebrities, showing its quite promising performance by using an extremely compact code with only 128 bits.

摘要

我们研究电视剧中的人脸视频检索问题,即给定某一角色的一条人脸轨迹,基于该特定角色的出现来搜索视频片段。这极具挑战性,一方面,电视剧中的人脸是在很大程度上不受控制的条件下拍摄的,外观变化复杂;另一方面,检索任务通常需要具有低时间和空间复杂度的高效表示。为解决这个问题,我们针对大量视频数据提出了一种紧凑且有区分性的表示方法,称为紧凑视频编码(CVC)。我们的方法首先通过其样本(即帧)协方差矩阵对人脸轨迹进行建模,以统计方式捕捉视频数据的变化。为纳入区分性信息并获得更适合检索的紧凑视频签名,高维协方差表示进一步被编码为维度低得多的二进制向量,最终得到所提出的CVC。具体而言,编码的每一位,即二进制向量的每一维,是通过在最大间隔框架下的监督学习生成的,其目的是在编码的可区分性和稳定性之间取得平衡。此外,我们将协方差矩阵的描述粒度从传统的像素级进一步扩展到更通用的块级,并进而提出一种名为空间金字塔协方差的新颖分层视频表示以及一种快速计算方法。在两个具有挑战性的电视剧视频数据库,即《生活大爆炸》和《越狱》上进行的人脸检索实验,证明了所提出的CVC相对于现有最先进检索方法的竞争力。此外,作为一种通用的视频匹配算法,CVC还在一个标准的互联网数据库,即YouTube名人数据库上的传统视频人脸识别任务中进行了评估,通过使用仅128位的极其紧凑的编码展示了其非常有前景的性能。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验