Department of Electronic Engineering and Information Science, University of Science and Technology of China, Hefei 230026, China.
IEEE Trans Pattern Anal Mach Intell. 2012 May;34(5):863-75. doi: 10.1109/TPAMI.2011.195.
This paper proposes the Flickr Distance (FD) to measure the visual correlation between concepts. For each concept, a collection of related images are obtained from the Flickr website. We assume that each concept consists of several states, e.g., different views, different semantics, etc., which are considered as latent topics. Then a latent topic visual language model (LTVLM) is built to capture these states. The Flickr distance between two concepts is defined as the Jensen-Shannon (J-S) divergence between their LTVLM. Differently from traditional conceptual distance measurements, which are based on Web textual documents, FD is based on the visual information. Comparing with the WordNet distance, FD can easily scale up with the increasing size of the conceptual corpus. Comparing with the Google Distance (NGD) and Tag Concurrence Distance (TCD), FD uses the visual information and can properly measure the conceptual relations. We apply FD to multimedia-related tasks and find methods based on FD significantly outperform those based on NGD and TCD. With the FD measurement, we also construct a large-scale visual conceptual network (VCNet) to store the knowledge of conceptual relationship. Experiments show that FD is more coherent to human cognition and it also outperforms text-based distances in real-world applications.
本文提出了 Flickr 距离(FD)来衡量概念之间的视觉相关性。对于每个概念,我们从 Flickr 网站获取一组相关的图像。我们假设每个概念包含若干个状态,例如不同的视图、不同的语义等,这些状态被视为潜在的主题。然后,我们构建一个潜在主题视觉语言模型(LTVLM)来捕捉这些状态。两个概念之间的 Flickr 距离定义为它们的 LTVLM 之间的 Jensen-Shannon(J-S)散度。与基于 Web 文本文档的传统概念距离度量方法不同,FD 基于视觉信息。与 WordNet 距离相比,FD 可以随着概念语料库的增大而轻松扩展。与 Google 距离(NGD)和标签并发距离(TCD)相比,FD 使用视觉信息,可以正确地衡量概念关系。我们将 FD 应用于多媒体相关任务,并发现基于 FD 的方法显著优于基于 NGD 和 TCD 的方法。通过 FD 度量,我们还构建了一个大规模的视觉概念网络(VCNet)来存储概念关系的知识。实验表明,FD 与人类认知更加一致,并且在实际应用中也优于基于文本的距离。