Faraki Masoud, Harandi Mehrtash T, Porikli Fatih
IEEE Trans Neural Netw Learn Syst. 2018 Nov;29(11):5701-5712. doi: 10.1109/TNNLS.2018.2812799. Epub 2018 Mar 27.
Core to many learning pipelines is visual recognition such as image and video classification. In such applications, having a compact yet rich and informative representation plays a pivotal role. An underlying assumption in traditional coding schemes [e.g., sparse coding (SC)] is that the data geometrically comply with the Euclidean space. In other words, the data are presented to the algorithm in vector form and Euclidean axioms are fulfilled. This is of course restrictive in machine learning, computer vision, and signal processing, as shown by a large number of recent studies. This paper takes a further step and provides a comprehensive mathematical framework to perform coding in curved and non-Euclidean spaces, i.e., Riemannian manifolds. To this end, we start by the simplest form of coding, namely, bag of words. Then, inspired by the success of vector of locally aggregated descriptors in addressing computer vision problems, we will introduce its Riemannian extensions. Finally, we study Riemannian form of SC, locality-constrained linear coding, and collaborative coding. Through rigorous tests, we demonstrate the superior performance of our Riemannian coding schemes against the state-of-the-art methods on several visual classification tasks, including head pose classification, video-based face recognition, and dynamic scene recognition.
许多学习流程的核心是视觉识别,如图像和视频分类。在这类应用中,拥有紧凑但丰富且信息量大的表示起着关键作用。传统编码方案(例如稀疏编码(SC))的一个潜在假设是数据在几何上符合欧几里得空间。换句话说,数据以向量形式呈现给算法,并且满足欧几里得公理。正如大量近期研究所表明的,这在机器学习、计算机视觉和信号处理中当然具有局限性。本文更进一步,提供了一个全面的数学框架,用于在弯曲和非欧几里得空间(即黎曼流形)中进行编码。为此,我们从最简单的编码形式——词袋模型开始。然后,受局部聚合描述符向量在解决计算机视觉问题方面取得成功的启发,我们将介绍其黎曼扩展。最后,我们研究稀疏编码、局部约束线性编码和协作编码的黎曼形式。通过严格测试,我们证明了我们的黎曼编码方案在包括头部姿态分类、基于视频的人脸识别和动态场景识别在内的多个视觉分类任务上相对于现有方法具有卓越性能。