Ling Tianheng, Qian Chao, Klann Theodor Mario, Hoever Julian, Einhaus Lukas, Schiele Gregor
Intelligent Embedded Systems of Computer Science, University of Duisburg-Essen, 47057 Duisburg, Germany.
Sensors (Basel). 2024 Dec 26;25(1):83. doi: 10.3390/s25010083.
This study presents a comprehensive workflow for developing and deploying Multi-Layer Perceptron (MLP)-based soft sensors on embedded FPGAs, addressing diverse deployment objectives. The proposed workflow extends our prior research by introducing greater model adaptability. It supports various configurations-spanning layer counts, neuron counts, and quantization bitwidths-to accommodate the constraints and capabilities of different FPGA platforms. The workflow incorporates a custom-developed, open-source toolchain that facilitates quantization-aware training, integer-only inference, automated accelerator generation using VHDL templates, and synthesis alongside performance estimation. A case study on fluid flow estimation was conducted on two FPGA platforms: the AMD Spartan-7 XC7S15 and the Lattice iCE40UP5K. For precision-focused and latency-sensitive deployments, a six-layer, 60-neuron MLP accelerator quantized to 8 bits on the XC7S15 achieved an MSE of 56.56, an MAPE of 1.61%, and an inference latency of 23.87 μs. Moreover, for low-power and energy-constrained deployments, a five-layer, 30-neuron MLP accelerator quantized to 8 bits on the iCE40UP5K achieved an inference latency of 83.37 μs, a power consumption of 2.06 mW, and an energy consumption of just 0.172 μJ per inference. These results confirm the workflow's ability to identify optimal FPGA accelerators tailored to specific deployment requirements, achieving a balanced trade-off between precision, inference latency, and energy efficiency.
本研究提出了一种用于在嵌入式FPGA上开发和部署基于多层感知器(MLP)的软传感器的综合工作流程,以满足不同的部署目标。所提出的工作流程通过引入更高的模型适应性扩展了我们之前的研究。它支持各种配置——包括层数、神经元数量和量化比特宽度——以适应不同FPGA平台的约束和能力。该工作流程包含一个定制开发的开源工具链,该工具链有助于量化感知训练、仅整数推理、使用VHDL模板自动生成加速器以及进行综合并估计性能。在两个FPGA平台上进行了流体流动估计的案例研究:AMD Spartan-7 XC7S15和Lattice iCE40UP5K。对于注重精度和对延迟敏感的部署,在XC7S15上量化为8位的六层、60神经元MLP加速器实现了56.56的均方误差(MSE)、1.61%的平均绝对百分比误差(MAPE)和23.87微秒的推理延迟。此外,对于低功耗和能量受限的部署,在iCE40UP5K上量化为8位的五层、30神经元MLP加速器实现了83.37微秒的推理延迟、2.06毫瓦的功耗以及每次推理仅0.172微焦耳的能量消耗。这些结果证实了该工作流程能够识别针对特定部署要求量身定制的最佳FPGA加速器,在精度、推理延迟和能源效率之间实现了平衡的权衡。