Algorithmic Center, University of Minho, 4800-058 Azurém, Portugal.
School of Medicine, University of Minho, 4710-057 Gualtar, Portugal.
Sensors (Basel). 2023 Apr 14;23(8):3993. doi: 10.3390/s23083993.
Multi-human detection and tracking in indoor surveillance is a challenging task due to various factors such as occlusions, illumination changes, and complex human-human and human-object interactions. In this study, we address these challenges by exploring the benefits of a low-level sensor fusion approach that combines grayscale and neuromorphic vision sensor (NVS) data. We first generate a custom dataset using an NVS camera in an indoor environment. We then conduct a comprehensive study by experimenting with different image features and deep learning networks, followed by a multi-input fusion strategy to optimize our experiments with respect to overfitting. Our primary goal is to determine the best input feature types for multi-human motion detection using statistical analysis. We find that there is a significant difference between the input features of optimized backbones, with the best strategy depending on the amount of available data. Specifically, under a low-data regime, event-based frames seem to be the preferred input feature type, while higher data availability benefits the combined use of grayscale and optical flow features. Our results demonstrate the potential of sensor fusion and deep learning techniques for multi-human tracking in indoor surveillance, although it is acknowledged that further studies are needed to confirm our findings.
多人体检测和跟踪在室内监控中是一项具有挑战性的任务,这是由于各种因素的影响,如遮挡、光照变化以及复杂的人与人、人与物的交互作用。在这项研究中,我们通过探索低水平传感器融合方法的优势来应对这些挑战,该方法结合了灰度和神经形态视觉传感器(NVS)数据。我们首先使用室内环境中的 NVS 相机生成一个定制的数据集。然后,我们通过实验研究不同的图像特征和深度学习网络,然后采用多输入融合策略,针对过拟合问题进行实验优化。我们的主要目标是通过统计分析确定使用多人体运动检测的最佳输入特征类型。我们发现,经过优化的骨干网络的输入特征存在显著差异,最佳策略取决于可用数据的数量。具体来说,在数据量较少的情况下,基于事件的帧似乎是首选的输入特征类型,而在数据量较大的情况下,灰度和光流特征的组合使用则更有利。我们的研究结果表明,传感器融合和深度学习技术在室内监控中的多人体跟踪方面具有潜力,但需要进一步的研究来证实我们的发现。