IEEE Trans Image Process. 2016 Sep;25(9):4406-4420. doi: 10.1109/TIP.2016.2590323. Epub 2016 Jul 11.
Scale invariant feature transform (SIFT) is effective for representing images in computer vision tasks, as one of the most resistant feature descriptions to common image deformations. However, two issues should be addressed: first, feature description based on gradient accumulation is not compact and contains redundancies; second, multiple orientations are often extracted from one local region and therefore produce multiple descriptions, which is not good for memory efficiency. To resolve these two issues, this paper introduces a novel method to determine the dominant orientation for multiple-orientation cases, named discrete cosine transform (DCT) intrinsic orientation, and a new DCT inspired feature transform (DIFT). In each local region, it first computes a unique DCT intrinsic orientation via DCT matrix and rotates the region accordingly, and then describes the rotated region with partial DCT matrix coefficients to produce an optimized low-dimensional descriptor. We test the accuracy and robustness of DIFT on real image matching. Afterward, extensive applications performed on public benchmarks for visual retrieval show that using DCT intrinsic orientation achieves performance on a par with SIFT, but with only 60% of its features; replacing the SIFT description with DIFT reduces dimensions from 128 to 32 and improves precision. Image reconstruction resulting from DIFT is presented to show another of its advantages over SIFT.
尺度不变特征变换(SIFT)在计算机视觉任务中对图像表示很有效,是对常见图像变形最具抗性的特征描述之一。然而,有两个问题需要解决:第一,基于梯度累积的特征描述不紧凑且包含冗余;第二,通常会从一个局部区域提取多个方向,因此会产生多个描述,这对内存效率不利。为了解决这两个问题,本文引入了一种新方法来确定多方向情况下的主导方向,即离散余弦变换(DCT)固有方向,以及一种受DCT启发的新特征变换(DIFT)。在每个局部区域中,它首先通过DCT矩阵计算唯一的DCT固有方向并相应地旋转该区域,然后用部分DCT矩阵系数描述旋转后的区域以生成优化的低维描述符。我们在真实图像匹配中测试了DIFT的准确性和鲁棒性。随后,在用于视觉检索的公共基准上进行的广泛应用表明,使用DCT固有方向的性能与SIFT相当,但特征数量仅为其60%;用DIFT替换SIFT描述可将维度从128降至32并提高精度。展示了由DIFT进行的图像重建,以说明它相对于SIFT的另一个优势。