ShaderNN：一种用于移动GPU实时应用的轻量级高效推理引擎。

ShaderNN: A Lightweight and Efficient Inference Engine for Real-time Applications on Mobile GPUs.

作者信息

Xie Jing, Yan Yuzhong, Saxena Abhishek, Qiu Qiang, Chen Jiangong, Sun Hongyu, Chen Rong, Bhattacharyya Shuvra S

机构信息

Department of Electrical and Computer Engineering, University of Maryland at College Park, 8223 Paint Branch Dr, College Park, MD, 20740, USA.

OPPO Seattle Research Center, 10940 NE 33rd Pl #202, Bellevue, WA, 98004, USA.

出版信息

Neurocomputing (Amst). 2025 Jan 1;611. doi: 10.1016/j.neucom.2024.128628. Epub 2024 Sep 19.

DOI:10.1016/j.neucom.2024.128628

PMID:39802630

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC11720964/

Abstract

Inference using deep neural networks on mobile devices has been an active area of research in recent years. The design of a deep learning inference framework targeted for mobile devices needs to consider various factors, such as the limited computational capacity of the devices, low power budget, varied memory access methods, and I/O bus bandwidth governed by the underlying processor's architecture. Furthermore, integrating an inference framework with time-sensitive applications - such as games and video-based software to perform tasks like ray tracing denoising and video processing - introduces the need to minimize data movement between processors and increase data locality in the target processor. In this paper, we propose Shader Neural Network (ShaderNN), an OpenGL-based, fast, and power-efficient inference framework designed for mobile devices to address these challenges. Our contributions include the following: (1) the texture-based input/output provides an efficient, zero-copy integration with real-time graphics pipelines or image processing applications, thereby saving expensive data transfers between CPU and GPU, which are unavoidable in most existing inference engines; (2) we are the first to leverage fragment shaders based on the OpenGL backend in neural network inference operators, which has an advantage in deploying parametrically small neural network models; (3) a hybrid implementation of the compute shader and fragment shader is proposed that enables layer-level shader selection to boost performance; and (4) we utilize OpenGL features - such as normalization, interpolation and texture padding - to improve performance. Experiments illustrate the favorable performance of ShaderNN over other popular on-device deep learning frameworks such as TensorFlow-Lite on the latest mobile devices powered by Qualcomm and MediaTek chips. A case study further demonstrates the usability and integration of the ShaderNN framework with a media processing Android application seamlessly. ShaderNN is available open source at Github (https://github.com/inferenceengine/shadernn).

摘要

近年来，在移动设备上使用深度神经网络进行推理一直是一个活跃的研究领域。针对移动设备设计的深度学习推理框架需要考虑各种因素，例如设备的计算能力有限、功耗预算低、内存访问方法多样以及由底层处理器架构决定的I/O总线带宽。此外，将推理框架与对时间敏感的应用程序（如游戏和基于视频的软件）集成，以执行光线追踪去噪和视频处理等任务，这就需要尽量减少处理器之间的数据移动，并提高目标处理器中的数据局部性。在本文中，我们提出了Shader神经网络（ShaderNN），这是一个基于OpenGL的、快速且节能的推理框架，专为移动设备设计，以应对这些挑战。我们的贡献包括以下几点：（1）基于纹理的输入/输出提供了与实时图形管道或图像处理应用程序的高效零拷贝集成，从而节省了CPU和GPU之间昂贵的数据传输，而这在大多数现有推理引擎中是不可避免的；（2）我们是第一个在神经网络推理算子中利用基于OpenGL后端的片段着色器的，这在部署参数规模较小的神经网络模型方面具有优势；（3）提出了计算着色器和片段着色器的混合实现，可实现层级别着色器选择以提高性能；（4）我们利用OpenGL特性（如归一化、插值和纹理填充）来提高性能。实验表明，在由高通和联发科芯片驱动的最新移动设备上，ShaderNN比其他流行的设备端深度学习框架（如TensorFlow-Lite）具有更好的性能。一个案例研究进一步证明了ShaderNN框架与媒体处理安卓应用程序无缝集成的可用性。ShaderNN在Github（https://github.com/inferenceengine/shadernn）上开源。