Suppr超能文献

融合卷积与稀疏编码以学习低维判别性图像表示

Integrating Convolution and Sparse Coding for Learning Low-Dimensional Discriminative Image Representations.

作者信息

Wei Xian, Liu Yingjie, Tang Xuan, Yu Shui, Chen Mingsong

出版信息

IEEE Trans Neural Netw Learn Syst. 2025 Jul;36(7):12483-12496. doi: 10.1109/TNNLS.2024.3453374.

Abstract

This work investigates the problem of efficiently learning discriminative low-dimensional (LD) representations of multiclass image objects. We propose a generic end-to-end approach that jointly optimizes sparse dictionary and convolutions for learning LOW-dimensional discriminative image representations, named SparConvLow, taking advantage of convolutional neural networks (CNNs), dictionary learning, and orthogonal projections. The whole learning process can be summarized as follows. First, a CNN module is employed to extract high-dimensional (HD) preliminary convolutional features. Second, to avoid the high computational cost of direct sparse coding on HD CNN features, we learn sparse representation (SR) over a task-driven dictionary in the space with the feature being orthogonally projected. We then exploit the discriminative projection on SR. The whole learning process is consistently treated as an end-to-end joint optimization problem of trace quotient maximization. The cost function is well-defined on the product of the CNN parameters space, the Stiefel manifold, the Oblique manifold, and the Grassmann manifold. By using the explicit gradient delivery, the cost function is optimized via a geometrical stochastic gradient descent (SGD) algorithm along with the chain rule and the backpropagation. The experimental results show that the proposed method can achieve a highly competitive performance with the state-of-the-art (SOTA) image classification, object categorization, and face recognition methods, under both supervised and semi-supervised settings. The code is available at https://github.com/MVPR-Group/SparConvLow.

摘要

这项工作研究了高效学习多类图像对象的判别性低维(LD)表示的问题。我们提出了一种通用的端到端方法,该方法联合优化稀疏字典和卷积,以学习低维判别性图像表示,名为SparConvLow,利用了卷积神经网络(CNN)、字典学习和正交投影。整个学习过程可总结如下。首先,使用一个CNN模块来提取高维(HD)初步卷积特征。其次,为避免对HD CNN特征进行直接稀疏编码的高计算成本,我们在通过正交投影特征的空间中,在任务驱动的字典上学习稀疏表示(SR)。然后,我们对SR进行判别性投影。整个学习过程始终被视为迹商最大化的端到端联合优化问题。成本函数在CNN参数空间、斯蒂费尔流形、斜流形和格拉斯曼流形的乘积上有明确定义。通过使用显式梯度传递,成本函数通过几何随机梯度下降(SGD)算法以及链式法则和反向传播进行优化。实验结果表明,在有监督和半监督设置下,该方法与当前最先进的(SOTA)图像分类、目标分类和人脸识别方法相比,能实现极具竞争力的性能。代码可在https://github.com/MVPR-Group/SparConvLow获取。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验