Ruiz-Barroso Paula, González-Linares José María, Castro Francisco M, Guil Nicolás
Department of Computer Architecture, Institute for Mechatronics Engineering & Cyber-Physical Systems (IMECH.UMA), Universidad de Málaga, Málaga, Spain.
Sci Rep. 2025 Jun 2;15(1):19276. doi: 10.1038/s41598-025-02351-x.
Deploying deep learning models on edge devices offers advantages in terms of data security and communication latency. However, optimizing these models to achieve fast computing speeds without sacrificing accuracy can be challenging, especially in video surveillance applications where real-time processing is crucial. In this study, we investigate the deployment of gait recognition models as a multi-objective selection problem in which we seek to simultaneously minimize several objectives, such as latency and energy consumption, while maintaining accuracy. The decision space of a problem comprises all models that can be built by varying parameters, such as the size of the model, the operating frequency of the device, and the precision of the operations. From this problem definition, a subset of Pareto optimal models can be selected to be deployed on the target device. We conducted experiments with two different gait recognition models on NVIDIA Jetson Orin Nano and Jetson AGX Orin to explore their decision spaces. In addition, we investigated different strategies to increase the throughput of the deployed models by taking advantage of batching and concurrent execution. Together, these techniques allowed us to design real-time solutions for gait recognition in scenarios with multiple subjects. These solutions can process between 42 and 188 simultaneous subjects at 25 inferences per second with an energy consumption ranging from 6.31 to 9.71 mJ per inference, depending on the device and the deployed model.
在边缘设备上部署深度学习模型在数据安全和通信延迟方面具有优势。然而,在不牺牲准确性的前提下优化这些模型以实现快速计算速度可能具有挑战性,特别是在实时处理至关重要的视频监控应用中。在本研究中,我们将步态识别模型的部署作为一个多目标选择问题进行研究,在这个问题中,我们试图在保持准确性的同时,同时最小化几个目标,如延迟和能耗。问题的决策空间包括通过改变参数(如模型大小、设备工作频率和操作精度)可以构建的所有模型。根据这个问题定义,可以选择帕累托最优模型的一个子集部署在目标设备上。我们在NVIDIA Jetson Orin Nano和Jetson AGX Orin上使用两种不同的步态识别模型进行了实验,以探索它们的决策空间。此外,我们研究了利用批处理和并发执行来提高已部署模型吞吐量的不同策略。这些技术共同使我们能够为多主体场景中的步态识别设计实时解决方案。根据设备和已部署的模型,这些解决方案每秒可以处理42到188个同时出现的主体,每次推理的能耗范围为6.31到9.71 mJ,每秒进行25次推理。