College of Computer Engineering, Shangqiu Polytechnic, Shangqiu, China.
School of Civil and Architectural Engineering, Shandong University of Technology, Zibo, China.
PLoS One. 2024 Jan 5;19(1):e0292345. doi: 10.1371/journal.pone.0292345. eCollection 2024.
In the process of Canny edge detection, a large number of high complexity calculations such as Gaussian filtering, gradient calculation, non-maximum suppression, and double threshold judgment need to be performed on the image, which takes up a lot of operation time, which is a great challenge to the real-time requirements of the algorithm. The traditional Canny edge detection technology mainly uses customized equipment such as DSP and FPGA, but it has some problems, such as long development cycle, difficult debugging, resource consumption, and so on. At the same time, the adopted CUDA platform has the problem of poor cross-platform. In order to solve this problem, a fine-grained parallel Canny edge detection method is proposed, which is optimized from three aspects: task partition, vector memory access, and NDRange optimization, and CPU-GPU collaborative parallelism is realized. At the same time, the parallel Canny edge detection methods based on multi-core CPU and CUDA architecture are designed. The experimental results show that OpenCL accelerated Canny edge detection algorithm (OCL_Canny) achieves 20.68 times acceleration ratio compared with CPU serial algorithm at 7452 × 8024 image resolution. At the image resolution of 3500 × 3500, the OCL_Canny algorithm achieves 3.96 times the acceleration ratio compared with the CPU multi-threaded Canny parallel algorithm. At 1024 × 1024 image resolution, the OCL_Canny algorithm achieves 1.21 times the acceleration ratio compared with the CUDA-based Canny parallel algorithm. The effectiveness and performance portability of the proposed Canny edge detection parallel algorithm are verified, and it provides a reference for the research of fast calculation of image big data.
在 Canny 边缘检测过程中,需要对图像进行大量的高复杂度计算,如高斯滤波、梯度计算、非极大值抑制和双阈值判断,这占用了大量的运算时间,对算法的实时性要求构成了巨大挑战。传统的 Canny 边缘检测技术主要采用 DSP 和 FPGA 等定制设备,但存在开发周期长、调试困难、资源消耗等问题。同时,采用的 CUDA 平台存在跨平台性差的问题。为了解决这个问题,提出了一种细粒度并行 Canny 边缘检测方法,从任务划分、向量内存访问和 NDRange 优化三个方面进行优化,实现了 CPU-GPU 协同并行。同时,设计了基于多核 CPU 和 CUDA 架构的并行 Canny 边缘检测方法。实验结果表明,在 7452×8024 图像分辨率下,与 CPU 串行算法相比,OpenCL 加速 Canny 边缘检测算法(OCL_Canny)的加速比达到 20.68 倍。在 3500×3500 的图像分辨率下,OCL_Canny 算法与 CPU 多线程 Canny 并行算法相比,加速比达到 3.96 倍。在 1024×1024 图像分辨率下,OCL_Canny 算法与基于 CUDA 的 Canny 并行算法相比,加速比达到 1.21 倍。验证了所提出的 Canny 边缘检测并行算法的有效性和性能可移植性,为图像大数据的快速计算研究提供了参考。