Opt Express. 2024 May 6;32(10):16645-16656. doi: 10.1364/OE.516681.
Single-Photon Avalanche Diode (SPAD) direct Time-of-Flight (dToF) sensors provide depth imaging over long distances, enabling the detection of objects even in the absence of contrast in colour or texture. However, distant objects are represented by just a few pixels and are subject to noise from solar interference, limiting the applicability of existing computer vision techniques for high-level scene interpretation. We present a new SPAD-based vision system for human activity recognition, based on convolutional and recurrent neural networks, which is trained entirely on synthetic data. In tests using real data from a 64×32 pixel SPAD, captured over a distance of 40 m, the scheme successfully overcomes the limited transverse resolution (in which human limbs are approximately one pixel across), achieving an average accuracy of 89% in distinguishing between seven different activities. The approach analyses continuous streams of video-rate depth data at a maximal rate of 66 FPS when executed on a GPU, making it well-suited for real-time applications such as surveillance or situational awareness in autonomous systems.
单光子雪崩二极管 (SPAD) 直接飞行时间 (dToF) 传感器可实现远距离深度成像,即使在颜色或纹理对比度缺乏的情况下也能检测到物体。然而,远距离物体只由几个像素表示,并且受到太阳干扰噪声的限制,这限制了现有计算机视觉技术在高级场景解释中的适用性。我们提出了一种新的基于 SPAD 的视觉系统,用于人体活动识别,该系统基于卷积和循环神经网络,完全在合成数据上进行训练。在使用来自 64×32 像素 SPAD 的实际数据进行的测试中,该方案成功克服了有限的横向分辨率(其中人体四肢大约为一个像素宽),在区分七种不同活动方面的平均准确率达到 89%。该方法在 GPU 上以最大 66 FPS 的速率分析视频帧率的连续深度数据流,非常适合实时应用,如监控或自主系统中的态势感知。