Liu Yu, Andhare Anurag, Kang Kyoung-Don
Department of Computer Science, State University of New York at Binghamton, 4400 Vestal Parkway East, Binghamton, NY 13902, USA.
Sensors (Basel). 2024 Aug 14;24(16):5262. doi: 10.3390/s24165262.
Intelligent mobile image sensing powered by deep learning analyzes images captured by cameras from mobile devices, such as smartphones or smartwatches. It supports numerous mobile applications, such as image classification, face recognition, and camera scene detection. Unfortunately, mobile devices often lack the resources necessary for deep learning, leading to increased inference latency and rapid battery consumption. Moreover, the inference accuracy may decline over time due to potential data drift. To address these issues, we introduce a new cost-efficient framework, called Corun, designed to simultaneously handle multiple inference queries and continual model retraining/fine-tuning of a pre-trained model on a single commodity GPU in an edge server to significantly improve the inference throughput, upholding the inference accuracy. The scheduling method of Corun undertakes offline profiling to find the maximum number of concurrent inferences that can be executed along with a retraining job on a single GPU without incurring an out-of-memory error or significantly increasing the latency. Our evaluation verifies the cost-effectiveness of Corun. The inference throughput provided by Corun scales with the number of concurrent inference queries. However, the latency of inference queries and the length of a retraining epoch increase at substantially lower rates. By concurrently processing multiple inference and retraining tasks on one GPU instead of using a separate GPU for each task, Corun could reduce the number of GPUs and cost required to deploy mobile image sensing applications based on deep learning at the edge.
由深度学习驱动的智能移动图像传感技术可分析智能手机或智能手表等移动设备摄像头拍摄的图像。它支持众多移动应用,如图像分类、人脸识别和相机场景检测。不幸的是,移动设备往往缺乏深度学习所需的资源,导致推理延迟增加和电池快速消耗。此外,由于潜在的数据漂移,推理准确性可能会随着时间的推移而下降。为了解决这些问题,我们引入了一个名为Corun的新的经济高效框架,旨在在边缘服务器的单个商用GPU上同时处理多个推理查询以及对预训练模型进行持续的模型重新训练/微调,以显著提高推理吞吐量,并保持推理准确性。Corun的调度方法进行离线分析,以找到在单个GPU上可以与重新训练作业同时执行的最大并发推理数量,而不会导致内存不足错误或显著增加延迟。我们的评估验证了Corun的成本效益。Corun提供的推理吞吐量与并发推理查询的数量成比例。然而,推理查询的延迟和重新训练轮次的长度以低得多的速率增加。通过在一个GPU上同时处理多个推理和重新训练任务,而不是为每个任务使用单独的GPU,Corun可以减少在边缘部署基于深度学习的移动图像传感应用所需的GPU数量和成本。