基于 FPGA 的量化神经网络混合式实现及其在遥感中的应用。

FPGA-Based Hybrid-Type Implementation of Quantized Neural Networks for Remote Sensing Applications.

机构信息

Beijing Key Laboratory of Embedded Real-time Information Processing Technology, Beijing Institute of Technology, Beijing 100081, China.

School of Information Engineering, Zhengzhou University, Zhengzhou 450001, China.

出版信息

Sensors (Basel). 2019 Feb 22;19(4):924. doi: 10.3390/s19040924.

DOI:10.3390/s19040924

PMID:30813259

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC6412419/

Abstract

Recently, extensive convolutional neural network (CNN)-based methods have been used in remote sensing applications, such as object detection and classification, and have achieved significant improvements in performance. Furthermore, there are a lot of hardware implementation demands for remote sensing real-time processing applications. However, the operation and storage processes in floating-point models hinder the deployment of networks in hardware implements with limited resource and power budgets, such as field-programmable gate arrays (FPGAs) and application-specific integrated circuits (ASICs). To solve this problem, this paper focuses on optimizing the hardware design of CNN with low bit-width integers by quantization. First, a symmetric quantization scheme-based hybrid-type inference method was proposed, which uses the low bit-width integer to replace floating-point precision. Then, a training approach for the quantized network is introduced to reduce accuracy degradation. Finally, a processing engine (PE) with a low bit-width is proposed to optimize the hardware design of FPGA for remote sensing image classification. Besides, a fused-layer PE is also presented for state-of-the-art CNNs equipped with Batch-Normalization and LeakyRelu. The experiments performed on the Moving and Stationary Target Acquisition and Recognition (MSTAR) dataset using a graphics processing unit (GPU) demonstrate that the accuracy of 8-bit quantized model drops by about 1%, which is an acceptable accuracy loss. The accuracy result tested on FPGA is consistent with that of GPU. As for the resource consumptions of FPGA, the Look Up Table (LUT), Flip-flop (FF), Digital Signal Processor (DSP), and Block Random Access Memory (BRAM) are reduced by 46.21%, 43.84%, 45%, and 51%, respectively, compared with that of floating-point implementation.

摘要

最近，基于深度卷积神经网络（CNN）的方法已广泛应用于遥感应用，如目标检测和分类，在性能上取得了显著提高。此外，遥感实时处理应用对硬件实现有大量的运算和存储需求。然而，浮点模型的运算和存储过程阻碍了网络在资源和功耗预算有限的硬件实现中的部署，如现场可编程门阵列（FPGA）和专用集成电路（ASIC）。为了解决这个问题，本文专注于通过量化优化 CNN 的低比特宽度整数的硬件设计。首先，提出了一种基于对称量化方案的混合推理方法，该方法使用低比特宽度整数来替换浮点精度。然后，引入了量化网络的训练方法来减少精度损失。最后，提出了一种低比特宽度的处理引擎（PE），以优化 FPGA 的硬件设计，用于遥感图像分类。此外，还提出了一种融合层的 PE，用于配备 Batch-Normalization 和 LeakyRelu 的最先进的 CNN。在图形处理单元（GPU）上使用 Moving and Stationary Target Acquisition and Recognition（MSTAR）数据集进行的实验表明，8 位量化模型的精度下降了约 1%，这是可以接受的精度损失。在 FPGA 上进行的精度测试结果与 GPU 的结果一致。对于 FPGA 的资源消耗，与浮点实现相比，查找表（LUT）、触发器（FF）、数字信号处理器（DSP）和块随机存取存储器（BRAM）分别减少了 46.21%、43.84%、45%和 51%。