Wang Rui, Wu Xiao-Jun, Chen Ziheng, Hu Cong, Kittler Josef
IEEE Trans Neural Netw Learn Syst. 2024 Jul;35(7):8924-8938. doi: 10.1109/TNNLS.2022.3216811. Epub 2024 Jul 8.
By characterizing each image set as a nonsingular covariance matrix on the symmetric positive definite (SPD) manifold, the approaches of visual content classification with image sets have made impressive progress. However, the key challenge of unhelpfully large intraclass variability and interclass similarity of representations remains open to date. Although, several recent studies have mitigated the two problems by jointly learning the embedding mapping and the similarity metric on the original SPD manifold, their inherent shallow and linear feature transformation mechanism are not powerful enough to capture useful geometric features, especially in complex scenarios. To this end, this article explores a novel approach, termed SPD manifold deep metric learning (SMDML), for image set classification. Specifically, SMDML first selects a prevailing SPD manifold neural network (SPDNet) as the backbone (encoder) to derive an SPD matrix nonlinear representation. To counteract the degradation of structural information during multistage feature embedding, we construct a Riemannian decoder at the end of the encoder, trained by a reconstruction error term (RT), to induce the generated low-dimensional feature manifold of the hidden layer to capture the pivotal information about the visual data describing the imaged scene. We demonstrate through theory and experiments that it is feasible to replace the Riemannian metric with Euclidean distance in RT. Then, the ReCov layer is introduced into the established Riemannian network to regularize the local statistical information within each input feature matrix, which enhances the effectiveness of the learning process. The theoretical analysis of the activation function used in the ReCov layer in terms of continuity and conditions for generating positive definite matrices is beneficial for network design. Inspired by the fact that the single cross-entropy loss used for training is unable to effectively parse the geometric distribution of the deep representations, we finally endow the suggested model with a novel metric learning regularization term. By explicitly incorporating the encoding and processing of the data variations into the network learning process, this term can not only derive a powerful Riemannian representation but also train an effective classifier. The experimental results show the superiority of the proposed approach on three typical visual classification tasks.
通过将每个图像集表征为对称正定(SPD)流形上的非奇异协方差矩阵,基于图像集的视觉内容分类方法取得了令人瞩目的进展。然而,类内差异过大和类间相似性这一关键挑战至今仍未解决。尽管最近的一些研究通过在原始SPD流形上联合学习嵌入映射和相似性度量缓解了这两个问题,但它们固有的浅层和线性特征变换机制不足以强大到捕获有用的几何特征,尤其是在复杂场景中。为此,本文探索了一种用于图像集分类的新方法,称为SPD流形深度度量学习(SMDML)。具体而言,SMDML首先选择一个流行的SPD流形神经网络(SPDNet)作为主干(编码器)来导出SPD矩阵非线性表示。为了抵消多阶段特征嵌入过程中结构信息的退化,我们在编码器末尾构建了一个黎曼解码器,通过重构误差项(RT)进行训练,以促使隐藏层生成的低维特征流形捕获描述成像场景的视觉数据的关键信息。我们通过理论和实验证明,在RT中用欧几里得距离代替黎曼度量是可行的。然后,将ReCov层引入已建立的黎曼网络,以规范每个输入特征矩阵内的局部统计信息,这增强了学习过程的有效性。对ReCov层中使用的激活函数在连续性和生成正定矩阵条件方面的理论分析有助于网络设计。受用于训练的单一交叉熵损失无法有效解析深度表示的几何分布这一事实的启发,我们最终为所提出的模型赋予了一个新的度量学习正则化项。通过将数据变化的编码和处理明确纳入网络学习过程,该项不仅可以导出强大的黎曼表示,还可以训练有效的分类器。实验结果表明了该方法在三个典型视觉分类任务上的优越性。