Venieris Stylianos I, Bouganis Christos-Savvas
IEEE Trans Neural Netw Learn Syst. 2019 Feb;30(2):326-342. doi: 10.1109/TNNLS.2018.2844093. Epub 2018 Jul 2.
Since neural networks renaissance, convolutional neural networks (ConvNets) have demonstrated a state-of-the-art performance in several emerging artificial intelligence tasks. The deployment of ConvNets in real-life applications requires power-efficient designs that meet the application-level performance needs. In this context, field-programmable gate arrays (FPGAs) can provide a potential platform that can be tailored to application-specific requirements. However, with the complexity of ConvNet models increasing rapidly, the ConvNet-to-FPGA design space becomes prohibitively large. This paper presents fpgaConvNet, an end-to-end framework for the optimized mapping of ConvNets on FPGAs. The proposed framework comprises an automated design methodology based on the synchronous dataflow (SDF) paradigm and defines a set of SDF transformations in order to efficiently navigate the architectural design space. By proposing a systematic multiobjective optimization formulation, the presented framework is able to generate hardware designs that are cooptimized for the ConvNet workload, the target device, and the application's performance metric of interest. Quantitative evaluation shows that the proposed methodology yields hardware designs that improve the performance by up to 6.65× over highly optimized graphics processing unit designs for the same power constraints and achieve up to 2.94× higher performance density compared with the state-of-the-art FPGA-based ConvNet architectures.
自神经网络复兴以来,卷积神经网络(ConvNets)在多个新兴人工智能任务中展现出了领先的性能。将ConvNets部署到实际应用中需要满足应用级性能需求的高能效设计。在这种背景下,现场可编程门阵列(FPGA)可以提供一个能够根据特定应用需求进行定制的潜在平台。然而,随着ConvNet模型的复杂度迅速增加,ConvNet到FPGA的设计空间变得极其庞大。本文介绍了fpgaConvNet,这是一个用于在FPGA上对ConvNets进行优化映射的端到端框架。所提出的框架包括一种基于同步数据流(SDF)范式的自动化设计方法,并定义了一组SDF变换,以便有效地探索架构设计空间。通过提出一种系统的多目标优化公式,所呈现的框架能够生成针对ConvNet工作负载、目标设备以及应用感兴趣的性能指标进行协同优化的硬件设计。定量评估表明,所提出的方法产生的硬件设计在相同功率约束下比高度优化的图形处理单元设计性能提高了高达6.65倍,并且与基于FPGA的最新ConvNet架构相比,性能密度提高了高达2.94倍。