Suppr超能文献

用于性别和动作识别的语义金字塔。

Semantic pyramids for gender and action recognition.

出版信息

IEEE Trans Image Process. 2014 Aug;23(8):3633-45. doi: 10.1109/TIP.2014.2331759. Epub 2014 Jun 18.

Abstract

Person description is a challenging problem in computer vision. We investigated two major aspects of person description: 1) gender and 2) action recognition in still images. Most state-of-the-art approaches for gender and action recognition rely on the description of a single body part, such as face or full-body. However, relying on a single body part is suboptimal due to significant variations in scale, viewpoint, and pose in real-world images. This paper proposes a semantic pyramid approach for pose normalization. Our approach is fully automatic and based on combining information from full-body, upper-body, and face regions for gender and action recognition in still images. The proposed approach does not require any annotations for upper-body and face of a person. Instead, we rely on pretrained state-of-the-art upper-body and face detectors to automatically extract semantic information of a person. Given multiple bounding boxes from each body part detector, we then propose a simple method to select the best candidate bounding box, which is used for feature extraction. Finally, the extracted features from the full-body, upper-body, and face regions are combined into a single representation for classification. To validate the proposed approach for gender recognition, experiments are performed on three large data sets namely: 1) human attribute; 2) head-shoulder; and 3) proxemics. For action recognition, we perform experiments on four data sets most used for benchmarking action recognition in still images: 1) Sports; 2) Willow; 3) PASCAL VOC 2010; and 4) Stanford-40. Our experiments clearly demonstrate that the proposed approach, despite its simplicity, outperforms state-of-the-art methods for gender and action recognition.

摘要

人物描述是计算机视觉中的一个具有挑战性的问题。我们研究了人物描述的两个主要方面:1)性别,2)静态图像中的动作识别。大多数最新的性别和动作识别方法都依赖于单个身体部位的描述,例如面部或全身。然而,由于现实世界图像中的尺度、视角和姿势存在很大差异,仅依赖单个身体部位是不理想的。本文提出了一种用于姿态归一化的语义金字塔方法。我们的方法是完全自动的,并且基于结合全身、上半身和面部区域的信息,用于静态图像中的性别和动作识别。所提出的方法不需要对人物的上半身和面部进行任何注释。相反,我们依赖于预先训练的最新上半身和面部检测器来自动提取人物的语义信息。给定每个身体部位检测器的多个边界框,我们然后提出了一种简单的方法来选择最佳候选边界框,该边界框用于特征提取。最后,从全身、上半身和面部区域提取的特征被组合成单个表示用于分类。为了验证所提出的方法在性别识别中的有效性,我们在三个大型数据集上进行了实验,即:1)人体属性;2)头部肩部;和 3)近体学。对于动作识别,我们在四个最常用于静态图像中动作识别基准测试的数据集上进行了实验:1)Sports;2)Willow;3)PASCAL VOC 2010;和 4)斯坦福 40。我们的实验清楚地表明,尽管所提出的方法很简单,但在性别和动作识别方面优于最新方法。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验