Ouyang Wanli, Zhou Hui, Li Hongsheng, Li Quanquan, Yan Junjie, Wang Xiaogang
IEEE Trans Pattern Anal Mach Intell. 2018 Aug;40(8):1874-1887. doi: 10.1109/TPAMI.2017.2738645. Epub 2017 Aug 11.
Feature extraction, deformation handling, occlusion handling, and classification are four important components in pedestrian detection. Existing methods learn or design these components either individually or sequentially. The interaction among these components is not yet well explored. This paper proposes that they should be jointly learned in order to maximize their strengths through cooperation. We formulate these four components into a joint deep learning framework and propose a new deep network architecture (Code available on www.ee.cuhk.edu.hk/wlouyang/projects/ouyangWiccv13Joint/index.html). By establishing automatic, mutual interaction among components, the deep model has average miss rate 8.57 percent/11.71 percent on the Caltech benchmark dataset with new/original annotations.
特征提取、变形处理、遮挡处理和分类是行人检测中的四个重要组成部分。现有方法要么单独学习或设计这些组件,要么按顺序进行。这些组件之间的相互作用尚未得到充分探索。本文提出应联合学习这些组件,以便通过合作发挥其最大优势。我们将这四个组件构建成一个联合深度学习框架,并提出一种新的深度网络架构(代码可在www.ee.cuhk.edu.hk/wlouyang/projects/ouyangWiccv13Joint/index.html上获取)。通过在组件之间建立自动的相互作用,该深度模型在具有新注释/原始注释的加州理工学院基准数据集上的平均漏检率为8.57%/11.71%。