IEEE Trans Pattern Anal Mach Intell. 2021 Aug;43(8):2582-2597. doi: 10.1109/TPAMI.2020.2974833. Epub 2021 Jul 1.
Compared with global average pooling in existing deep convolutional neural networks (CNNs), global covariance pooling can capture richer statistics of deep features, having potential for improving representation and generalization abilities of deep CNNs. However, integration of global covariance pooling into deep CNNs brings two challenges: (1) robust covariance estimation given deep features of high dimension and small sample size; (2) appropriate usage of geometry of covariances. To address these challenges, we propose a global Matrix Power Normalized COVariance (MPN-COV) Pooling. Our MPN-COV conforms to a robust covariance estimator, very suitable for scenario of high dimension and small sample size. It can also be regarded as Power-Euclidean metric between covariances, effectively exploiting their geometry. Furthermore, a global Gaussian embedding network is proposed to incorporate first-order statistics into MPN-COV. For fast training of MPN-COV networks, we implement an iterative matrix square root normalization, avoiding GPU unfriendly eigen-decomposition inherent in MPN-COV. Additionally, progressive 1×1 convolutions and group convolution are introduced to compress covariance representations. The proposed methods are highly modular, readily plugged into existing deep CNNs. Extensive experiments are conducted on large-scale object classification, scene categorization, fine-grained visual recognition and texture classification, showing our methods outperform the counterparts and obtain state-of-the-art performance.
与现有深度卷积神经网络 (CNN) 中的全局平均池化相比,全局协方差池化可以捕获更深特征的更丰富的统计信息,具有提高深度 CNN 表示和泛化能力的潜力。然而,将全局协方差池化集成到深度 CNN 中带来了两个挑战:(1) 对高维和小样本量的深度特征进行稳健的协方差估计;(2) 适当利用协方差的几何形状。为了解决这些挑战,我们提出了全局矩阵幂归一化协方差(MPN-COV)池化。我们的 MPN-COV 符合稳健的协方差估计器,非常适合高维和小样本量的情况。它也可以看作是协方差之间的幂欧几里得度量,有效地利用了它们的几何形状。此外,提出了一种全局高斯嵌入网络,将一阶统计量纳入 MPN-COV 中。为了快速训练 MPN-COV 网络,我们实现了迭代矩阵平方根归一化,避免了 MPN-COV 中固有的 GPU 不友好的特征分解。此外,还引入了渐进式 1×1 卷积和分组卷积来压缩协方差表示。所提出的方法具有高度的模块化,可以轻松地插入到现有的深度 CNN 中。在大规模目标分类、场景分类、细粒度视觉识别和纹理分类等方面进行了广泛的实验,结果表明,我们的方法优于对照方法,并获得了最先进的性能。