深度学习应用的边缘资源优化与批处理和模型管理。

Optimization of Edge Resources for Deep Learning Application with Batch and Model Management.

机构信息

Korea Electronics Technology Institute, Seongnam 13509, Korea.

出版信息

Sensors (Basel). 2022 Sep 5;22(17):6717. doi: 10.3390/s22176717.

DOI:10.3390/s22176717

PMID:36081177

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC9460489/

Abstract

As deep learning technology paves its way, real-world applications that make use of it become popular these days. Edge computing architecture is one of the service architectures to realize the deep learning based service, which makes use of the resources near the data source or client. In Edge computing architecture it becomes important to manage resource usage, and there is research on optimization of deep learning, such as pruning or binarization, which makes deep learning models more lightweight, along with the research for the efficient distribution of workloads on cloud or edge resources. Those are to reduce the workload on edge resources. In this paper, a usage optimization method with batch and model management is proposed. The proposed method is to increase the utilization of GPU resource by modifying the batch size of the input of an inference application. To this end, the inference pipelines are identified to see how the different kinds of resources are used, and then the effect of batch inference on GPU is measured. The proposed method consists of a few modules, including a tool for batch size management which is able to change a batch size with respect to the available resources, and another one for model management which supports on-the-fly update of a model. The proposed methods are implemented on a real-time video analysis application and deployed in the Kubernetes cluster as a Docker container. The result shows that the proposed method can optimize the usage of edge resources for real-time video analysis deep learning applications.

摘要

随着深度学习技术的发展，如今利用它的实际应用变得越来越流行。边缘计算架构是实现基于深度学习服务的一种服务架构，它利用靠近数据源或客户端的资源。在边缘计算架构中，管理资源使用变得非常重要，因此有研究针对深度学习的优化，例如剪枝或二值化，这些优化使深度学习模型更轻量级，并针对在云或边缘资源上高效分配工作负载进行研究。这些研究都是为了减少边缘资源上的工作负载。本文提出了一种带有批量和模型管理的使用优化方法。该方法通过修改推理应用程序输入的批量大小来增加 GPU 资源的利用率。为此，识别推理管道以查看不同类型的资源如何使用，然后测量批量推理对 GPU 的影响。该方法由几个模块组成，包括一个用于批量大小管理的工具，该工具能够根据可用资源更改批量大小，另一个用于模型管理的工具，该工具支持模型的实时更新。所提出的方法已在实时视频分析应用程序上实现，并作为 Docker 容器部署在 Kubernetes 群集上。结果表明，该方法可以优化实时视频分析深度学习应用程序的边缘资源使用。