School of Computer Science and Engineering, VIT-AP University, Amaravati 522237, India.
Sensors (Basel). 2023 Feb 25;23(5):2569. doi: 10.3390/s23052569.
Activity recognition in unmanned aerial vehicle (UAV) surveillance is addressed in various computer vision applications such as image retrieval, pose estimation, object detection, object detection in videos, object detection in still images, object detection in video frames, face recognition, and video action recognition. In the UAV-based surveillance technology, video segments captured from aerial vehicles make it challenging to recognize and distinguish human behavior. In this research, to recognize a single and multi-human activity using aerial data, a hybrid model of histogram of oriented gradient (HOG), mask-regional convolutional neural network (Mask-RCNN), and bidirectional long short-term memory (Bi-LSTM) is employed. The HOG algorithm extracts patterns, Mask-RCNN extracts feature maps from the raw aerial image data, and the Bi-LSTM network exploits the temporal relationship between the frames for the underlying action in the scene. This Bi-LSTM network reduces the error rate to the greatest extent due to its bidirectional process. This novel architecture generates enhanced segmentation by utilizing the histogram gradient-based instance segmentation and improves the accuracy of classifying human activities using the Bi-LSTM approach. Experimental outcomes demonstrate that the proposed model outperforms the other state-of-the-art models and has achieved 99.25% accuracy on the YouTube-Aerial dataset.
在各种计算机视觉应用中,如图像检索、姿态估计、目标检测、视频中的目标检测、静态图像中的目标检测、视频帧中的目标检测、人脸识别和视频动作识别,都涉及到无人机 (UAV) 监控中的活动识别。在基于无人机的监控技术中,从空中车辆捕获的视频片段使得识别和区分人类行为变得具有挑战性。在这项研究中,为了使用空中数据识别单人或多人活动,采用了方向梯度直方图 (HOG)、掩模区域卷积神经网络 (Mask-RCNN) 和双向长短时记忆 (Bi-LSTM) 的混合模型。HOG 算法提取模式,Mask-RCNN 从原始空中图像数据中提取特征图,Bi-LSTM 网络利用场景中底层动作的帧之间的时间关系。由于其双向过程,该 Bi-LSTM 网络最大限度地降低了错误率。这种新颖的架构通过利用基于直方图梯度的实例分割来增强分割,并通过 Bi-LSTM 方法提高分类人类活动的准确性。实验结果表明,所提出的模型优于其他最先进的模型,在 YouTube-Aerial 数据集上达到了 99.25%的准确率。