IEEE Trans Pattern Anal Mach Intell. 2012 Jan;34(1):66-78. doi: 10.1109/TPAMI.2011.103. Epub 2011 May 19.
SIFT-like local feature descriptors are ubiquitously employed in computer vision applications such as content-based retrieval, video analysis, copy detection, object recognition, photo tourism, and 3D reconstruction. Feature descriptors can be designed to be invariant to certain classes of photometric and geometric transformations, in particular, affine and intensity scale transformations. However, real transformations that an image can undergo can only be approximately modeled in this way, and thus most descriptors are only approximately invariant in practice. Second, descriptors are usually high dimensional (e.g., SIFT is represented as a 128-dimensional vector). In large-scale retrieval and matching problems, this can pose challenges in storing and retrieving descriptor data. We map the descriptor vectors into the Hamming space in which the Hamming metric is used to compare the resulting representations. This way, we reduce the size of the descriptors by representing them as short binary strings and learn descriptor invariance from examples. We show extensive experimental validation, demonstrating the advantage of the proposed approach.
SIFT 类局部特征描述符在计算机视觉应用中被广泛使用,例如基于内容的检索、视频分析、复制检测、目标识别、照片旅游和 3D 重建。特征描述符可以设计为对某些类别的光度和几何变换不变,特别是仿射和强度尺度变换。然而,图像可以经历的实际变换只能以这种方式进行近似建模,因此大多数描述符在实践中只是近似不变的。其次,描述符通常是高维的(例如,SIFT 表示为 128 维向量)。在大规模检索和匹配问题中,这可能会在存储和检索描述符数据方面带来挑战。我们将描述符向量映射到汉明空间中,在该空间中使用汉明距离来比较得到的表示。通过这种方式,我们通过将描述符表示为短的二进制字符串来减小描述符的大小,并从示例中学习描述符不变性。我们进行了广泛的实验验证,展示了所提出方法的优势。