Suppr超能文献

ReStAC - 针对嵌入式ARM和CUDA设备优化的无人机搭载实时SGM立体视觉

ReStAC-UAV-Borne Real-Time SGM Stereo Optimized for Embedded ARM and CUDA Devices.

作者信息

Ruf Boitumelo, Mohrs Jonas, Weinmann Martin, Hinz Stefan, Beyerer Jürgen

机构信息

Fraunhofer Center for Machine Learning, Fraunhofer Institute of Optronics, System Technologies and Image Exploitation (IOSB), 76131 Karlsruhe, Germany.

Institute of Photogrammetry and Remote Sensing, Karlsruhe Institute of Technology (KIT), 76131 Karlsruhe, Germany.

出版信息

Sensors (Basel). 2021 Jun 7;21(11):3938. doi: 10.3390/s21113938.

Abstract

With the emergence of low-cost robotic systems, such as unmanned aerial vehicle, the importance of embedded high-performance image processing has increased. For a long time, FPGAs were the only processing hardware that were capable of high-performance computing, while at the same time preserving a low power consumption, essential for embedded systems. However, the recently increasing availability of embedded GPU-based systems, such as the NVIDIA Jetson series, comprised of an ARM CPU and a NVIDIA Tegra GPU, allows for massively parallel embedded computing on graphics hardware. With this in mind, we propose an approach for real-time embedded stereo processing on ARM and CUDA-enabled devices, which is based on the popular and widely used Semi-Global Matching algorithm. In this, we propose an optimization of the algorithm for embedded CUDA GPUs, by using massively parallel computing, as well as using the NEON intrinsics to optimize the algorithm for vectorized SIMD processing on embedded ARM CPUs. We have evaluated our approach with different configurations on two public stereo benchmark datasets to demonstrate that they can reach an error rate as low as 3.3%. Furthermore, our experiments show that the fastest configuration of our approach reaches up to 46 FPS on VGA image resolution. Finally, in a use-case specific qualitative evaluation, we have evaluated the power consumption of our approach and deployed it on the DJI Manifold 2-G attached to a DJI Matrix 210v2 RTK unmanned aerial vehicle (UAV), demonstrating its suitability for real-time stereo processing onboard a UAV.

摘要

随着低成本机器人系统的出现,如无人机,嵌入式高性能图像处理的重要性日益增加。长期以来,现场可编程门阵列(FPGA)是唯一能够进行高性能计算,同时保持低功耗的处理硬件,而低功耗对于嵌入式系统至关重要。然而,最近基于嵌入式GPU的系统越来越多,例如由ARM CPU和NVIDIA Tegra GPU组成的NVIDIA Jetson系列,这使得在图形硬件上进行大规模并行嵌入式计算成为可能。考虑到这一点,我们提出了一种在支持ARM和CUDA的设备上进行实时嵌入式立体处理的方法,该方法基于流行且广泛使用的半全局匹配算法。在此,我们提出了一种针对嵌入式CUDA GPU的算法优化方法,通过使用大规模并行计算,以及使用NEON内在函数来优化算法,以便在嵌入式ARM CPU上进行矢量化单指令多数据(SIMD)处理。我们在两个公共立体基准数据集上使用不同配置对我们的方法进行了评估,以证明它们可以达到低至3.3%的错误率。此外,我们的实验表明,我们方法中最快的配置在VGA图像分辨率下可达46帧每秒(FPS)。最后,在一个特定用例的定性评估中,我们评估了我们方法的功耗,并将其部署在连接到DJI Matrix 210v2 RTK无人机(UAV)的DJI Manifold 2-G上,证明了其适用于无人机上的实时立体处理。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0c84/8201159/7be04879ddf6/sensors-21-03938-g0A1.jpg

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验