IEEE Trans Pattern Anal Mach Intell. 2017 May;39(5):922-936. doi: 10.1109/TPAMI.2016.2564409. Epub 2016 May 6.
Visual observations of dynamic phenomena, such as human actions, are often represented as sequences of smoothly-varying features. In cases where the feature spaces can be structured as Riemannian manifolds, the corresponding representations become trajectories on manifolds. Analysis of these trajectories is challenging due to non-linearity of underlying spaces and high-dimensionality of trajectories. In vision problems, given the nature of physical systems involved, these phenomena are better characterized on a low-dimensional manifold compared to the space of Riemannian trajectories. For instance, if one does not impose physical constraints of the human body, in data involving human action analysis, the resulting representation space will have highly redundant features. Learning an effective, low-dimensional embedding for action representations will have a huge impact in the areas of search and retrieval, visualization, learning, and recognition. Traditional manifold learning addresses this problem for static points in the euclidean space, but its extension to Riemannian trajectories is non-trivial and remains unexplored. The difficulty lies in inherent non-linearity of the domain and temporal variability of actions that can distort any traditional metric between trajectories. To overcome these issues, we use the framework based on transported square-root velocity fields (TSRVF); this framework has several desirable properties, including a rate-invariant metric and vector space representations. We propose to learn an embedding such that each action trajectory is mapped to a single point in a low-dimensional euclidean space, and the trajectories that differ only in temporal rates map to the same point. We utilize the TSRVF representation, and accompanying statistical summaries of Riemannian trajectories, to extend existing coding methods such as PCA, KSVD and Label Consistent KSVD to Riemannian trajectories or more generally to Riemannian functions. We show that such coding efficiently captures trajectories in applications such as action recognition, stroke rehabilitation, visual speech recognition, clustering and diverse sequence sampling. Using this framework, we obtain state-of-the-art recognition results, while reducing the dimensionality/ complexity by a factor of 100-250x. Since these mappings and codes are invertible, they can also be used to interactively-visualize Riemannian trajectories and synthesize actions.
对动态现象(如人类行为)的直观观测通常表现为平滑变化的特征序列。在特征空间可以构造成黎曼流形的情况下,相应的表示就成为了流形上的轨迹。由于底层空间的非线性和轨迹的高维性,这些轨迹的分析具有挑战性。在视觉问题中,鉴于所涉及物理系统的性质,与黎曼轨迹空间相比,这些现象在低维流形上得到更好的描述。例如,如果不施加人体的物理约束,在涉及人类行为分析的数据中,所得到的表示空间将具有高度冗余的特征。对于动作表示,学习有效的低维嵌入将对搜索和检索、可视化、学习和识别等领域产生巨大影响。传统的流形学习方法解决了欧几里得空间中静态点的问题,但将其扩展到黎曼轨迹是非平凡的,尚未得到探索。困难在于轨迹之间的传统度量受到域的固有非线性和动作的时间可变性的影响而扭曲。为了解决这些问题,我们使用基于传输平方根速度场(TSRVF)的框架;这个框架具有几个理想的属性,包括不变率度量和向量空间表示。我们提出学习一种嵌入,使得每个动作轨迹都映射到低维欧几里得空间中的单个点,并且仅在时间率上有所不同的轨迹映射到同一个点。我们利用 TSRVF 表示以及黎曼轨迹的伴随统计摘要,将现有的编码方法(如 PCA、KSVD 和标签一致 KSVD)扩展到黎曼轨迹或更一般地扩展到黎曼函数。我们表明,这种编码在动作识别、中风康复、视觉语音识别、聚类和多种序列采样等应用中有效地捕获了轨迹。使用这个框架,我们获得了最先进的识别结果,同时将维数/复杂性降低了 100-250 倍。由于这些映射和编码是可逆的,它们也可以用于交互式可视化黎曼轨迹和合成动作。