QuantLaneNet：一种基于FPGA的用于车道检测的640帧每秒且34千兆次运算每秒每瓦的卷积神经网络加速器。

QuantLaneNet: A 640-FPS and 34-GOPS/W FPGA-Based CNN Accelerator for Lane Detection.

作者信息

Lam Duc Khai, Du Cam Vinh, Pham Hoai Luan

机构信息

Computer Engineering Department, University of Information Technology, Ho Chi Minh City 700000, Vietnam.

Vietnam National University, Ho Chi Minh City 700000, Vietnam.

出版信息

Sensors (Basel). 2023 Jul 25;23(15):6661. doi: 10.3390/s23156661.

DOI:10.3390/s23156661

PMID:37571445

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC10422460/

Abstract

Lane detection is one of the most fundamental problems in the rapidly developing field of autonomous vehicles. With the dramatic growth of deep learning in recent years, many models have achieved a high accuracy for this task. However, most existing deep-learning methods for lane detection face two main problems. First, most early studies usually follow a segmentation approach, which requires much post-processing to extract the necessary geometric information about the lane lines. Second, many models fail to reach real-time speed due to the high complexity of model architecture. To offer a solution to these problems, this paper proposes a lightweight convolutional neural network that requires only two small arrays for minimum post-processing, instead of segmentation maps for the task of lane detection. This proposed network utilizes a simple lane representation format for its output. The proposed model can achieve 93.53% accuracy on the TuSimple dataset. A hardware accelerator is proposed and implemented on the Virtex-7 VC707 FPGA platform to optimize processing time and power consumption. Several techniques, including data quantization to reduce data width down to 8-bit, exploring various loop-unrolling strategies for different convolution layers, and pipelined computation across layers, are optimized in the proposed hardware accelerator architecture. This implementation can process at 640 FPS while consuming only 10.309 W, equating to a computation throughput of 345.6 GOPS and energy efficiency of 33.52 GOPS/W.

摘要

车道检测是快速发展的自动驾驶领域中最基本的问题之一。近年来，随着深度学习的迅猛发展，许多模型在这项任务上都取得了很高的准确率。然而，现有的大多数用于车道检测的深度学习方法面临两个主要问题。首先，大多数早期研究通常采用分割方法，这需要大量的后处理来提取有关车道线的必要几何信息。其次，由于模型架构的高度复杂性，许多模型无法达到实时速度。为了解决这些问题，本文提出了一种轻量级卷积神经网络，该网络在车道检测任务中仅需要两个小阵列进行最少的后处理，而不是分割图。所提出的网络在其输出中使用了一种简单的车道表示格式。所提出的模型在TuSimple数据集上可以达到93.53%的准确率。提出并在Virtex-7 VC707 FPGA平台上实现了一种硬件加速器，以优化处理时间。在所提出的硬件加速器架构中，对几种技术进行了优化，包括将数据宽度量化至8位、为不同卷积层探索各种循环展开策略以及跨层流水线计算。该实现可以在仅消耗10.309W的情况下以640帧/秒的速度进行处理，相当于345.6 GOPS的计算吞吐量和33.52 GOPS/W的能源效率。