Suppr
超能文献

基于卷积神经网络的监控环境下多模态人体识别。

CNN-Based Multimodal Human Recognition in Surveillance Environments.

机构信息

Division of Electronics and Electrical Engineering, Dongguk University, 30 Pil-dong-ro, 1-gil, Jung-gu, Seoul 100-715, Korea.

出版信息

Sensors (Basel). 2018 Sep 11;18(9):3040. doi: 10.3390/s18093040.

DOI:10.3390/s18093040

PMID:30208648

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC6164664/

Abstract

In the current field of human recognition, most of the research being performed currently is focused on re-identification of different body images taken by several cameras in an outdoor environment. On the other hand, there is almost no research being performed on indoor human recognition. Previous research on indoor recognition has mainly focused on face recognition because the camera is usually closer to a person in an indoor environment than an outdoor environment. However, due to the nature of indoor surveillance cameras, which are installed near the ceiling and capture images from above in a downward direction, people do not look directly at the cameras in most cases. Thus, it is often difficult to capture front face images, and when this is the case, facial recognition accuracy is greatly reduced. To overcome this problem, we can consider using the face and body for human recognition. However, when images are captured by indoor cameras rather than outdoor cameras, in many cases only part of the target body is included in the camera viewing angle and only part of the body is captured, which reduces the accuracy of human recognition. To address all of these problems, this paper proposes a multimodal human recognition method that uses both the face and body and is based on a deep convolutional neural network (CNN). Specifically, to solve the problem of not capturing part of the body, the results of recognizing the face and body through separate CNNs of VGG Face-16 and ResNet-50 are combined based on the score-level fusion by Weighted Sum rule to improve recognition performance. The results of experiments conducted using the custom-made Dongguk face and body database (DFB-DB1) and the open ChokePoint database demonstrate that the method proposed in this study achieves high recognition accuracy (the equal error rates of 1.52% and 0.58%, respectively) in comparison to face or body single modality-based recognition and other methods used in previous studies.

摘要

在当前的人类识别领域，大多数正在进行的研究都集中在对户外环境中多个摄像机拍摄的不同身体图像的再识别上。另一方面，对室内人类识别的研究几乎没有。之前的室内识别研究主要集中在人脸识别上，因为在室内环境中，摄像机通常比在户外环境中更接近人。然而，由于室内监控摄像机的性质，它们安装在天花板附近，并且从下方向下方向拍摄图像，因此在大多数情况下，人们不会直接看向摄像机。因此，通常很难捕捉到正面人脸图像，在这种情况下，人脸识别的准确性会大大降低。为了解决这个问题，我们可以考虑同时使用人脸和身体进行人体识别。但是，当图像由室内摄像机而不是户外摄像机拍摄时，在许多情况下，目标身体的只有一部分被包含在摄像机视角内，并且只有部分身体被捕获，这降低了人体识别的准确性。为了解决所有这些问题，本文提出了一种基于深度卷积神经网络（CNN）的使用人脸和身体的多模态人体识别方法。具体来说，为了解决未捕获部分身体的问题，通过 VGG Face-16 和 ResNet-50 的单独 CNN 分别识别人脸和身体的结果，基于 Weighted Sum 规则的分数级融合进行结合，以提高识别性能。使用定制的 Dongguk 人脸和身体数据库（DFB-DB1）和开放的 ChokePoint 数据库进行的实验结果表明，与基于人脸或身体单一模态的识别相比，本研究提出的方法在识别准确率方面表现出色（分别为 1.52%和 0.58%的等错误率），并且优于之前研究中使用的其他方法。