IEEE Trans Image Process. 2017 Mar;26(3):1393-1404. doi: 10.1109/TIP.2017.2655449. Epub 2017 Jan 18.
Given a query photo issued by a user (q-user), the landmark retrieval is to return a set of photos with their landmarks similar to those of the query, while the existing studies on the landmark retrieval focus on exploiting geometries of landmarks for similarity matches between candidate photos and a query photo. We observe that the same landmarks provided by different users over social media community may convey different geometry information depending on the viewpoints and/or angles, and may, subsequently, yield very different results. In fact, dealing with the landmarks with low quality shapes caused by the photography of q-users is often nontrivial and has seldom been studied. In this paper, we propose a novel framework, namely, multi-query expansions, to retrieve semantically robust landmarks by two steps. First, we identify the top- k photos regarding the latent topics of a query landmark to construct multi-query set so as to remedy its possible low quality shape. For this purpose, we significantly extend the techniques of Latent Dirichlet Allocation. Then, motivated by the typical collaborative filtering methods, we propose to learn a collaborative deep networks-based semantically, nonlinear, and high-level features over the latent factor for landmark photo as the training set, which is formed by matrix factorization over collaborative user-photo matrix regarding the multi-query set. The learned deep network is further applied to generate the features for all the other photos, meanwhile resulting into a compact multi-query set within such space. Then, the final ranking scores are calculated over the high-level feature space between the multi-query set and all other photos, which are ranked to serve as the final ranking list of landmark retrieval. Extensive experiments are conducted on real-world social media data with both landmark photos together with their user information to show the superior performance over the existing methods, especially our recently proposed multi-query based mid-level pattern representation method [1].
给定用户(q-user)发布的查询照片,地标检索是返回一组与查询照片地标相似的照片,而现有的地标检索研究侧重于利用地标几何形状进行候选照片与查询照片之间的相似性匹配。我们观察到,社交媒体社区中不同用户提供的相同地标可能会根据视点和/或角度传达不同的几何信息,并且可能会产生非常不同的结果。实际上,处理由 q-user 拍摄的地标形状质量较低的问题通常并不简单,而且很少有研究涉及。在本文中,我们提出了一种新颖的框架,即多查询扩展,通过两步检索语义上鲁棒的地标。首先,我们识别与查询地标潜在主题相关的前 k 张照片,以构建多查询集,从而弥补其可能的低质量形状。为此,我们显著扩展了潜在狄利克雷分配技术。然后,受典型协同过滤方法的启发,我们提出了一种基于协同深度网络的学习方法,该方法基于潜在因子学习地标照片的语义、非线性和高级特征,该潜在因子是通过多查询集的协同用户照片矩阵的矩阵分解形成的。学习到的深度网络进一步应用于生成所有其他照片的特征,同时在该空间内生成紧凑的多查询集。然后,在多查询集和所有其他照片之间的高级特征空间上计算最终的排序得分,这些得分将被排序为地标检索的最终排名列表。我们在真实的社交媒体数据上进行了广泛的实验,其中包括地标照片及其用户信息,以证明我们的方法优于现有的方法,特别是我们最近提出的基于多查询的中级模式表示方法[1]。