通过优化自适应激活函数在边缘设备上使用现场可编程门阵列实现高效神经网络。

Efficient Neural Networks on the Edge with FPGAs by Optimizing an Adaptive Activation Function.

作者信息

Jiang Yiyue, Vaicaitis Andrius, Dooley John, Leeser Miriam

机构信息

Department of Electrical and Computer Engineering, Northeastern University, Boston, MA 02115, USA.

Department of Electronic Engineering, Maynooth University, W23 F2H6 Maynooth, Ireland.

出版信息

Sensors (Basel). 2024 Mar 13;24(6):1829. doi: 10.3390/s24061829.

DOI:10.3390/s24061829

PMID:38544092

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC10974330/

Abstract

The implementation of neural networks (NNs) on edge devices enables local processing of wireless data, but faces challenges such as high computational complexity and memory requirements when deep neural networks (DNNs) are used. Shallow neural networks customized for specific problems are more efficient, requiring fewer resources and resulting in a lower latency solution. An additional benefit of the smaller network size is that it is suitable for real-time processing on edge devices. The main concern with shallow neural networks is their accuracy performance compared to DNNs. In this paper, we demonstrate that a customized adaptive activation function (AAF) can meet the accuracy of a DNN. We designed an efficient FPGA implementation for a customized segmented spline curve neural network (SSCNN) structure to replace the traditional fixed activation function with an AAF. We compared our SSCNN with different neural network structures such as a real-valued time-delay neural network (RVTDNN), an augmented real-valued time-delay neural network (ARVTDNN), and deep neural networks with different parameters. Our proposed SSCNN implementation uses 40% fewer hardware resources and no block RAMs compared to the DNN with similar accuracy. We experimentally validated this computationally efficient and memory-saving FPGA implementation of the SSCNN for digital predistortion of radio-frequency (RF) power amplifiers using the AMD/Xilinx RFSoC ZCU111. The implemented solution uses less than 3% of the available resources. The solution also enables an increase of the clock frequency to 221.12 MHz, allowing the transmission of wide bandwidth signals.

摘要

在边缘设备上实现神经网络（NN）能够对无线数据进行本地处理，但在使用深度神经网络（DNN）时面临诸如高计算复杂度和内存需求等挑战。针对特定问题定制的浅层神经网络效率更高，所需资源更少，从而产生更低延迟的解决方案。较小网络规模的另一个好处是它适用于边缘设备上的实时处理。与DNN相比，浅层神经网络的主要问题在于其准确性表现。在本文中，我们证明了定制的自适应激活函数（AAF）能够达到DNN的准确性。我们为定制的分段样条曲线神经网络（SSCNN）结构设计了一种高效的现场可编程门阵列（FPGA）实现方案，用AAF取代传统的固定激活函数。我们将我们的SSCNN与不同的神经网络结构进行了比较，如实值时延神经网络（RVTDNN）、增强实值时延神经网络（ARVTDNN）以及具有不同参数的深度神经网络。与具有相似准确性的DNN相比，我们提出的SSCNN实现方案使用的硬件资源少40%，且不使用块随机存取存储器（block RAM）。我们使用AMD/Xilinx RFSoC ZCU111对这种计算高效且节省内存的SSCNN的FPGA实现方案进行了实验验证，用于射频（RF）功率放大器的数字预失真。所实现的解决方案使用的可用资源不到3%。该解决方案还能够将时钟频率提高到221.12兆赫兹，从而允许传输宽带宽信号。