用于图像处理应用的基于现场可编程门阵列的处理器加速

FPGA-Based Processor Acceleration for Image Processing Applications.

作者信息

Siddiqui Fahad, Amiri Sam, Minhas Umar Ibrahim, Deng Tiantai, Woods Roger, Rafferty Karen, Crookes Daniel

机构信息

School of Electronics, Electrical Engineering and Computer Science, Queen's University Belfast, Belfast BT7 1NN, UK.

School of Computing, Electronics and Maths, Coventry University, Coventry CV1 5FB, UK.

出版信息

J Imaging. 2019 Jan 13;5(1):16. doi: 10.3390/jimaging5010016.

DOI:10.3390/jimaging5010016

PMID:34465705

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC8320866/

Abstract

FPGA-based embedded image processing systems offer considerable computing resources but present programming challenges when compared to software systems. The paper describes an approach based on an FPGA-based soft processor called (IPPro) which can operate up to 337 MHz on a high-end Xilinx FPGA family and gives details of the dataflow-based programming environment. The approach is demonstrated for a -means clustering operation and a traffic sign recognition application, both of which have been prototyped on an Avnet Zedboard that has Xilinx Zynq-7000 system-on-chip (SoC). A number of parallel dataflow mapping options were explored giving a speed-up of 8 times for the -means clustering using 16 IPPro cores, and a speed-up of 9.6 times for the morphology filter operation of the traffic sign recognition using 16 IPPro cores compared to their equivalent ARM-based software implementations. We show that for -means clustering, the 16 IPPro cores implementation is 57, 28 and 1.7 times more power efficient (fps/W) than ARM Cortex-A7 CPU, nVIDIA GeForce GTX980 GPU and ARM Mali-T628 embedded GPU respectively.

摘要

基于现场可编程门阵列（FPGA）的嵌入式图像处理系统提供了可观的计算资源，但与软件系统相比存在编程挑战。本文描述了一种基于名为IPPro的基于FPGA的软处理器的方法，该处理器在高端赛灵思FPGA系列上可运行至337兆赫兹，并详细介绍了基于数据流的编程环境。该方法通过K均值聚类操作和交通标志识别应用进行了演示，这两个应用均已在配备赛灵思Zynq-7000片上系统（SoC）的安富利Zedboard上进行了原型设计。探索了多种并行数据流映射选项，与等效的基于ARM的软件实现相比，使用16个IPPro内核进行K均值聚类时加速了8倍，使用16个IPPro内核进行交通标志识别的形态学滤波操作时加速了9.6倍。我们表明，对于K均值聚类，16个IPPro内核的实现分别比ARM Cortex-A7 CPU、英伟达GeForce GTX980 GPU和ARM Mali-T628嵌入式GPU的能效（每秒帧数/瓦）高57倍、28倍和1.7倍。