IEEE Trans Pattern Anal Mach Intell. 2018 Dec;40(12):2827-2840. doi: 10.1109/TPAMI.2017.2776154. Epub 2017 Nov 22.
Riemannian manifolds have been widely employed for video representations in visual classification tasks including video-based face recognition. The success mainly derives from learning a discriminant Riemannian metric which encodes the non-linear geometry of the underlying Riemannian manifolds. In this paper, we propose a novel metric learning framework to learn a distance metric across a Euclidean space and a Riemannian manifold to fuse average appearance and pattern variation of faces within one video. The proposed metric learning framework can handle three typical tasks of video-based face recognition: Video-to-Still, Still-to-Video and Video-to-Video settings. To accomplish this new framework, by exploiting typical Riemannian geometries for kernel embedding, we map the source Euclidean space and Riemannian manifold into a common Euclidean subspace, each through a corresponding high-dimensional Reproducing Kernel Hilbert Space (RKHS). With this mapping, the problem of learning a cross-view metric between the two source heterogeneous spaces can be converted to learning a single-view Euclidean distance metric in the target common Euclidean space. By learning information on heterogeneous data with the shared label, the discriminant metric in the common space improves face recognition from videos. Extensive experiments on four challenging video face databases demonstrate that the proposed framework has a clear advantage over the state-of-the-art methods in the three classical video-based face recognition scenarios.
黎曼流形已被广泛应用于视觉分类任务中的视频表示,包括基于视频的人脸识别。其成功主要源于学习判别黎曼度量,该度量编码了潜在黎曼流形的非线性几何结构。在本文中,我们提出了一种新的度量学习框架,用于学习欧式空间和黎曼流形之间的距离度量,以融合视频中人脸的平均外观和模式变化。所提出的度量学习框架可以处理基于视频的人脸识别的三个典型任务:视频到静态图像、静态图像到视频和视频到视频设置。为了实现这个新框架,通过利用核嵌入的典型黎曼几何,我们将源欧式空间和黎曼流形映射到一个共同的欧式子空间中,每个空间通过相应的高维再生核希尔伯特空间(RKHS)进行映射。通过这种映射,学习两个源异构空间之间的跨视图度量的问题可以转换为学习目标共同欧式空间中的单视图欧式距离度量。通过对共享标签的异构数据进行学习,可以在共同空间中提高人脸识别的判别度量。在四个具有挑战性的视频人脸数据库上的广泛实验表明,与基于视频的人脸识别的三个经典场景中的现有方法相比,所提出的框架具有明显的优势。