Shi Cong, Luo Gang
Department of Ophthalmology, Harvard Medical School, Schepens Eye Research Institute, Massachusetts Eye and Ear, Boston, MA 02114 USA.
IEEE Trans Circuits Syst Video Technol. 2018 Apr;28(4):1021-1036. doi: 10.1109/TCSVT.2016.2630848. Epub 2016 Nov 18.
This paper proposes a bio-inspired visual motion estimation algorithm based on motion energy, along with its compact very-large-scale integration (VLSI) architecture using low-cost embedded systems. The algorithm mimics motion perception functions of retina, V1, and MT neurons in a primate visual system. It involves operations of ternary edge extraction, spatiotemporal filtering, motion energy extraction, and velocity integration. Moreover, we propose the concept of confidence map to indicate the reliability of estimation results on each probing location. Our algorithm involves only additions and multiplications during runtime, which is suitable for low-cost hardware implementation. The proposed VLSI architecture employs multiple (frame, pixel, and operation) levels of pipeline and massively parallel processing arrays to boost the system performance. The array unit circuits are optimized to minimize hardware resource consumption. We have prototyped the proposed architecture on a low-cost field-programmable gate array platform (Zynq 7020) running at 53-MHz clock frequency. It achieved 30-frame/s real-time performance for velocity estimation on 160 × 120 probing locations. A comprehensive evaluation experiment showed that the estimated velocity by our prototype has relatively small errors (average endpoint error < 0.5 pixel and angular error < 10°) for most motion cases.
本文提出了一种基于运动能量的仿生视觉运动估计算法,以及使用低成本嵌入式系统的紧凑型超大规模集成(VLSI)架构。该算法模仿了灵长类视觉系统中视网膜、V1和MT神经元的运动感知功能。它涉及三元边缘提取、时空滤波、运动能量提取和速度积分等操作。此外,我们提出了置信度图的概念,以指示每个探测位置上估计结果的可靠性。我们的算法在运行时仅涉及加法和乘法,适用于低成本硬件实现。所提出的VLSI架构采用了多个(帧、像素和操作)级别的流水线和大规模并行处理阵列来提高系统性能。对阵列单元电路进行了优化,以最小化硬件资源消耗。我们已在运行于53MHz时钟频率的低成本现场可编程门阵列平台(Zynq 7020)上对所提出的架构进行了原型设计。它在160×120个探测位置上实现了30帧/秒的速度估计实时性能。一项综合评估实验表明,对于大多数运动情况,我们原型估计的速度具有相对较小的误差(平均端点误差<0.5像素,角度误差<10°)。