Torralba Antonio, Fergus Rob, Freeman William T
Computer Science and Artificial Intelligence Lab (CSAIL), Massachusetts Institute of Technology, 32 Vassar Street, Cambridge, MA 02139, USA.
IEEE Trans Pattern Anal Mach Intell. 2008 Nov;30(11):1958-70. doi: 10.1109/TPAMI.2008.128.
With the advent of the Internet, billions of images are now freely available online and constitute a dense sampling of the visual world. Using a variety of non-parametric methods, we explore this world with the aid of a large dataset of 79,302,017 images collected from the Internet. Motivated by psychophysical results showing the remarkable tolerance of the human visual system to degradations in image resolution, the images in the dataset are stored as 32 x 32 color images. Each image is loosely labeled with one of the 75,062 non-abstract nouns in English, as listed in the Wordnet lexical database. Hence the image database gives a comprehensive coverage of all object categories and scenes. The semantic information from Wordnet can be used in conjunction with nearest-neighbor methods to perform object classification over a range of semantic levels minimizing the effects of labeling noise. For certain classes that are particularly prevalent in the dataset, such as people, we are able to demonstrate a recognition performance comparable to class-specific Viola-Jones style detectors.
随着互联网的出现,数十亿张图像如今可在网上免费获取,构成了视觉世界的密集采样。借助从互联网收集的包含79302017张图像的大型数据集,我们使用各种非参数方法来探索这个世界。受心理物理学结果的启发,这些结果表明人类视觉系统对图像分辨率下降具有显著的耐受性,数据集中的图像存储为32×32的彩色图像。每张图像都用Wordnet词汇数据库中列出的75062个非抽象英语名词之一进行了大致标注。因此,图像数据库全面涵盖了所有对象类别和场景。来自Wordnet的语义信息可与最近邻方法结合使用,在一系列语义级别上执行对象分类,以最小化标注噪声的影响。对于数据集中特别普遍的某些类别,例如人,我们能够证明其识别性能与特定类别的Viola-Jones风格检测器相当。