Suppr超能文献

一种新型的自动化 Python 边缘到边缘:从云端的自动化生成到基于 FPGA 的低功耗物联网系统的边缘的用户应用部署,用于深度神经网络的加速。

A Novel Automate Python Edge-to-Edge: From Automated Generation on Cloud to User Application Deployment on Edge of Deep Neural Networks for Low Power IoT Systems FPGA-Based Acceleration.

机构信息

Electronics and Microelectronics Unit (SEMi), University of Mons, 7000 Mons, Belgium.

Ecole Nationale d'Ingénieurs de Sousse, Université de Sousse, Sousse 4000, Tunisia.

出版信息

Sensors (Basel). 2021 Sep 9;21(18):6050. doi: 10.3390/s21186050.

Abstract

Deep Neural Networks (DNNs) deployment for IoT Edge applications requires strong skills in hardware and software. In this paper, a novel design framework fully automated for Edge applications is proposed to perform such a deployment on System-on-Chips. Based on a high-level Python interface that mimics the leading Deep Learning software frameworks, it offers an easy way to implement a hardware-accelerated DNN on an FPGA. To do this, our design methodology covers the three main phases: (a) customization: where the user specifies the optimizations needed on each DNN layer, (b) generation: the framework generates on the Cloud the necessary binaries for both FPGA and software parts, and (c) deployment: the SoC on the Edge receives the resulting files serving to program the FPGA and related Python libraries for user applications. Among the study cases, an optimized DNN for the MNIST database can speed up more than 60× a software version on the ZYNQ 7020 SoC and still consume less than 0.43W. A comparison with the state-of-the-art frameworks demonstrates that our methodology offers the best trade-off between throughput, power consumption, and system cost.

摘要

深度神经网络(DNN)在物联网边缘应用中的部署需要硬件和软件方面的专业技能。本文提出了一种全新的设计框架,该框架完全针对边缘应用实现了自动化,可在片上系统上执行此类部署。它基于模仿领先的深度学习软件框架的高级 Python 接口,为在 FPGA 上实现硬件加速的 DNN 提供了一种简单的方法。为此,我们的设计方法涵盖了三个主要阶段:(a)定制:用户在此阶段指定每个 DNN 层所需的优化,(b)生成:框架在云端生成 FPGA 和软件部分所需的二进制文件,以及 (c)部署:边缘的 SoC 接收生成的文件,用于对 FPGA 进行编程,并为用户应用程序提供相关的 Python 库。在研究案例中,针对 MNIST 数据库的优化 DNN 可以在 ZYNQ 7020 SoC 上的软件版本上提速超过 60 倍,而功耗仍低于 0.43W。与最先进的框架进行比较表明,我们的方法在吞吐量、功耗和系统成本之间实现了最佳折衷。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/499c/8467982/fe9c02e665ce/sensors-21-06050-g001.jpg

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验