Department of Electrical Engineering and Computer Science, University of Michigan Ann Arbor, MI, USA.
Department of Computer Science, University of Washington Seattle, WA, USA.
Front Psychol. 2014 May 20;5:417. doi: 10.3389/fpsyg.2014.00417. eCollection 2014.
Building fine-grained visual recognition systems that are capable of recognizing tens of thousands of categories, has received much attention in recent years. The well known semantic hierarchical structure of categories and concepts, has been shown to provide a key prior which allows for optimal predictions. The hierarchical organization of various domains and concepts has been subject to extensive research, and led to the development of the WordNet domains hierarchy (Fellbaum, 1998), which was also used to organize the images in the ImageNet (Deng et al., 2009) dataset, in which the category count approaches the human capacity. Still, for the human visual system, the form of the hierarchy must be discovered with minimal use of supervision or innate knowledge. In this work, we propose a new Bayesian generative model for learning such domain hierarchies, based on semantic input. Our model is motivated by the super-subordinate organization of domain labels and concepts that characterizes WordNet, and accounts for several important challenges: maintaining context information when progressing deeper into the hierarchy, learning a coherent semantic concept for each node, and modeling uncertainty in the perception process.
近年来,构建能够识别成千上万种类别的细粒度视觉识别系统受到了广泛关注。众所周知的类别和概念的语义层次结构已被证明是一个关键的先验知识,它允许进行最佳预测。各种领域和概念的层次结构已经进行了广泛的研究,并导致了 WordNet 领域层次结构的发展(Fellbaum,1998),该层次结构也被用于组织 ImageNet(Deng 等人,2009)数据集中的图像,其中类别数接近人类的能力。尽管如此,对于人类视觉系统来说,必须在最少使用监督或先天知识的情况下发现层次结构的形式。在这项工作中,我们提出了一种新的基于语义输入的贝叶斯生成模型,用于学习这种领域层次结构。我们的模型受到 WordNet 中域标签和概念的上下级组织的启发,并考虑了几个重要的挑战:在深入层次结构时保持上下文信息、为每个节点学习一致的语义概念,以及对感知过程中的不确定性进行建模。