Wielgosz Maciej, Karwatowski Michał
Faculty of Computer Science, Electronics and Telecommunications, AGH University of Science and Technology, al. Adama Mickiewicza 30, 30-059 Cracow, Poland.
Academic Computer Centre CYFRONET AGH, ul. Nawojki 11, 30-072 Cracow, Poland.
Sensors (Basel). 2019 Jul 5;19(13):2981. doi: 10.3390/s19132981.
Internet of things (IoT) infrastructure, fast access to knowledge becomes critical. In some application domains, such as robotics, autonomous driving, predictive maintenance, and anomaly detection, the response time of the system is more critical to ensure Quality of Service than the quality of the answer. In this paper, we propose a methodology, a set of predefined steps to be taken in order to map the models to hardware, especially field programmable gate arrays (FPGAs), with the main focus on latency reduction. Multi-objective covariance matrix adaptation evolution strategy (MO-CMA-ES) was employed along with custom scores for sparsity, bit-width of the representation and quality of the model. Furthermore, we created a framework which enables mapping of neural models to FPGAs. The proposed solution is validated using three case studies and Xilinx Zynq UltraScale+ MPSoC 285 XCZU15EG as a platform. The results show a compression ratio for quantization and pruning in different scenarios with and without retraining procedures. Using our publicly available framework, we achieved 210 ns of latency for a single processing step for a model composed of two long short-term memory (LSTM) and a single dense layer.
对于物联网(IoT)基础设施而言,快速获取知识变得至关重要。在一些应用领域,如机器人技术、自动驾驶、预测性维护和异常检测中,系统的响应时间对于确保服务质量比答案的质量更为关键。在本文中,我们提出了一种方法,即一组为将模型映射到硬件(特别是现场可编程门阵列(FPGA))而要采取的预定义步骤,主要侧重于减少延迟。我们采用了多目标协方差矩阵自适应进化策略(MO-CMA-ES)以及针对稀疏性、表示的位宽和模型质量的自定义分数。此外,我们创建了一个能够将神经模型映射到FPGA的框架。所提出的解决方案通过三个案例研究以及以赛灵思Zynq UltraScale+ MPSoC 285 XCZU15EG作为平台进行了验证。结果显示了在有和没有重新训练过程的不同场景下量化和剪枝的压缩率。使用我们公开可用的框架,对于一个由两个长短期记忆(LSTM)和一个全连接层组成的模型,我们实现了单个处理步骤210纳秒的延迟。