IEEE Trans Neural Netw Learn Syst. 2013 Aug;24(8):1239-52. doi: 10.1109/TNNLS.2013.2253563.
Recognition of objects in still images has traditionally been regarded as a difficult computational problem. Although modern automated methods for visual object recognition have achieved steadily increasing recognition accuracy, even the most advanced computational vision approaches are unable to obtain performance equal to that of humans. This has led to the creation of many biologically inspired models of visual object recognition, among them the hierarchical model and X (HMAX) model. HMAX is traditionally known to achieve high accuracy in visual object recognition tasks at the expense of significant computational complexity. Increasing complexity, in turn, increases computation time, reducing the number of images that can be processed per unit time. In this paper we describe how the computationally intensive and biologically inspired HMAX model for visual object recognition can be modified for implementation on a commercial field-programmable aate Array, specifically the Xilinx Virtex 6 ML605 evaluation board with XC6VLX240T FPGA. We show that with minor modifications to the traditional HMAX model we can perform recognition on images of size 128 × 128 pixels at a rate of 190 images per second with a less than 1% loss in recognition accuracy in both binary and multiclass visual object recognition tasks.
在静态图像中识别物体一直被认为是一个困难的计算问题。尽管现代自动化视觉对象识别方法的识别准确率稳步提高,但即使是最先进的计算视觉方法也无法获得与人类相当的性能。这导致了许多受生物启发的视觉对象识别模型的创建,其中包括层次模型和 X(HMAX)模型。传统上,HMAX 模型以在视觉对象识别任务中实现高精度而著称,但代价是计算复杂度显著增加。复杂性的增加反过来又增加了计算时间,减少了单位时间内可以处理的图像数量。在本文中,我们描述了如何对计算密集型且受生物启发的 HMAX 视觉对象识别模型进行修改,以便在商业现场可编程门阵列(FPGA)上实现,具体来说是 Xilinx Virtex 6 ML605 评估板和 XC6VLX240T FPGA。我们表明,通过对传统 HMAX 模型进行微小修改,我们可以以每秒 190 张图像的速度对 128×128 像素大小的图像进行识别,在二进制和多类视觉对象识别任务中,识别准确率的损失不到 1%。