Ferecatu Marin, Geman Donald
TSI Department, Institut Telecom, Telecom Paristech, 46, rue Barrault, 75634 Paris, France.
IEEE Trans Pattern Anal Mach Intell. 2009 Jun;31(6):1087-101. doi: 10.1109/TPAMI.2008.259.
Starting from a member of an image database designated the "query image," traditional image retrieval techniques, for example, search by visual similarity, allow one to locate additional instances of a target category residing in the database. However, in many cases, the query image or, more generally, the target category, resides only in the mind of the user as a set of subjective visual patterns, psychological impressions, or "mental pictures." Consequently, since image databases available today are often unstructured and lack reliable semantic annotations, it is often not obvious how to initiate a search session; this is the "page zero problem." We propose a new statistical framework based on relevance feedback to locate an instance of a semantic category in an unstructured image database with no semantic annotation. A search session is initiated from a random sample of images. At each retrieval round, the user is asked to select one image from among a set of displayed images-the one that is closest in his opinion to the target class. The matching is then "mental." Performance is measured by the number of iterations necessary to display an image which satisfies the user, at which point standard techniques can be employed to display other instances. Our core contribution is a Bayesian formulation which scales to large databases. The two key components are a response model which accounts for the user's subjective perception of similarity and a display algorithm which seeks to maximize the flow of information. Experiments with real users and two databases of 20,000 and 60,000 images demonstrate the efficiency of the search process.
从图像数据库中指定为“查询图像”的成员开始,传统的图像检索技术,例如通过视觉相似性进行搜索,允许人们在数据库中定位目标类别中存在的其他实例。然而,在许多情况下,查询图像,或者更一般地说,目标类别,仅作为一组主观视觉模式、心理印象或“心理图像”存在于用户的脑海中。因此,由于当今可用的图像数据库通常是无结构的且缺乏可靠的语义注释,通常不清楚如何启动搜索会话;这就是“零页问题”。我们提出了一种基于相关反馈的新统计框架,用于在没有语义注释的无结构图像数据库中定位语义类别的实例。搜索会话从图像的随机样本开始。在每一轮检索中,要求用户从一组显示的图像中选择一幅——他认为最接近目标类别的那幅。然后这种匹配是“心理上的”。性能通过显示满足用户的图像所需的迭代次数来衡量,此时可以采用标准技术来显示其他实例。我们的核心贡献是一种可扩展到大型数据库的贝叶斯公式。两个关键组件是一个响应模型,它考虑了用户对相似性的主观感知,以及一个显示算法,它试图最大化信息流。对真实用户以及包含20000张和60000张图像的两个数据库进行的实验证明了搜索过程的效率。