Suppr超能文献

基于忆阻器的计算内存中极分解网络的性能估计,用于实时和低功耗的语义分割。

Performance estimation for the memristor-based computing-in-memory implementation of extremely factorized network for real-time and low-power semantic segmentation.

机构信息

Institute for Advanced Materials, South China Academy of Advanced Optoelectronics, South China Normal University, Guangzhou, 510006, China; Guangdong Provincial Key Laboratory of Optical Information Materials and Technology, South China Academy of Advanced Optoelectronics, South China Normal University, Guangzhou, 510006, China.

Institute for Advanced Materials, South China Academy of Advanced Optoelectronics, South China Normal University, Guangzhou, 510006, China; Guangdong Provincial Key Laboratory of Optical Information Materials and Technology, South China Academy of Advanced Optoelectronics, South China Normal University, Guangzhou, 510006, China.

出版信息

Neural Netw. 2023 Mar;160:202-215. doi: 10.1016/j.neunet.2023.01.008. Epub 2023 Jan 13.

Abstract

Nowadays many semantic segmentation algorithms have achieved satisfactory accuracy on von Neumann platforms (e.g., GPU), but the speed and energy consumption have not meet the high requirements of certain edge applications like autonomous driving. To tackle this issue, it is of necessity to design an efficient lightweight semantic segmentation algorithm and then implement it on emerging hardware platforms with high speed and energy efficiency. Here, we first propose an extremely factorized network (EFNet) which can learn multi-scale context information while preserving rich spatial information with reduced model complexity. Experimental results on the Cityscapes dataset show that EFNet achieves an accuracy of 68.0% mean intersection over union (mIoU) with only 0.18M parameters, at a speed of 99 frames per second (FPS) on a single RTX 3090 GPU. Then, to further improve the speed and energy efficiency, we design a memristor-based computing-in-memory (CIM) accelerator for the hardware implementation of EFNet. It is shown by the simulation in DNN+NeuroSim V2.0 that the memristor-based CIM accelerator is ∼63× (∼4.6×) smaller in area, at most ∼9.2× (∼1000×) faster, and ∼470× (∼2400×) more energy-efficient than the RTX 3090 GPU (the Jetson Nano embedded development board), although its accuracy slightly decreases by 1.7% mIoU. Therefore, the memristor-based CIM accelerator has great potential to be deployed at the edge to implement lightweight semantic segmentation models like EFNet. This study showcases an algorithm-hardware co-design to realize real-time and low-power semantic segmentation at the edge.

摘要

如今,许多语义分割算法在冯·诺依曼架构(如 GPU)上已经达到了令人满意的精度,但速度和能耗尚未满足自动驾驶等某些边缘应用的高要求。为了解决这个问题,有必要设计一种高效的轻量级语义分割算法,然后在具有高速和高能效的新兴硬件平台上实现它。在这里,我们首先提出了一种极端因子化网络(EFNet),它可以在保持丰富的空间信息的同时学习多尺度上下文信息,同时减少模型复杂度。在 Cityscapes 数据集上的实验结果表明,EFNet 在单个 RTX 3090 GPU 上以 99 帧/秒(FPS)的速度实现了 68.0%的平均交并比(mIoU),仅使用 0.18M 参数。然后,为了进一步提高速度和能效,我们为 EFNet 的硬件实现设计了一种基于忆阻器的计算内存储(CIM)加速器。通过在 DNN+NeuroSim V2.0 中的仿真表明,基于忆阻器的 CIM 加速器在面积上小 63 倍(约 4.6 倍),最快可达 9.2 倍(约 1000 倍),能效高 470 倍(约 2400 倍),尽管其精度略降低了 1.7%的 mIoU。因此,基于忆阻器的 CIM 加速器具有在边缘部署以实现轻量级语义分割模型(如 EFNet)的巨大潜力。本研究展示了一种算法-硬件协同设计,以实现边缘的实时和低功耗语义分割。

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验