• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

一款在低硅成本下,实现 99%PE 利用率和 100%内存隐藏的神经网络推理专用集成电路。

An ASIP for Neural Network Inference on Embedded Devices with 99% PE Utilization and 100% Memory Hidden under Low Silicon Cost.

机构信息

School of Information and Electronics, Beijing Institute of Technology, Beijing 100811, China.

出版信息

Sensors (Basel). 2022 May 19;22(10):3841. doi: 10.3390/s22103841.

DOI:10.3390/s22103841
PMID:35632250
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC9146143/
Abstract

The computation efficiency and flexibility of the accelerator hinder deep neural network (DNN) implementation in embedded applications. Although there are many publications on deep neural network (DNN) processors, there is still much room for deep optimization to further improve results. Multiple dimensions must be simultaneously considered when designing a DNN processor to reach the performance limit of the architecture, including architecture decision, flexibility, energy efficiency, and silicon cost minimization. Flexibility is defined as the ability to support as many multiple networks as possible and to easily adjust the scale. For energy efficiency, there are huge opportunities for power efficiency optimization, which involves access minimization and memory latency minimization based on on-chip memory minimization. Therefore, this work focused on low-power and low-latency data access with minimized silicon cost. This research was implemented based on an ASIP (application specific instruction set processor) in which an ISA was based on the caffe2 inference operator and the hardware design was based on a single instruction multiple data (SIMD) architecture. The scalability and system performance of our SoC extension scheme were demonstrated. The VLIW was used to execute multiple instructions in parallel. All costs for data access time were thus eliminated for the convolution layer. Finally, the processor was synthesized based on TSMC 65 nm technology with a 200 MHz clock, and the Soc extension scheme was analyzed in an experimental model. Our design was tested on several typical neural networks, achieving 196 GOPS at 200 MHz and 241 GOPS/W on the VGG16Net and AlexNet.

摘要

加速器的计算效率和灵活性阻碍了深度神经网络(DNN)在嵌入式应用中的实现。尽管有许多关于深度神经网络(DNN)处理器的出版物,但仍有很大的深度优化空间,可以进一步提高结果。在设计 DNN 处理器时,必须同时考虑多个维度,以达到架构的性能极限,包括架构决策、灵活性、能效和硅片成本最小化。灵活性定义为支持尽可能多的多种网络以及轻松调整规模的能力。对于能效,有很大的优化机会,可以基于片上内存最小化来最小化访问和内存延迟。因此,这项工作专注于具有最小硅成本的低功耗和低延迟数据访问。这项研究是基于 ASIP(特定应用指令集处理器)实现的,其中 ISA 基于 caffe2 推理运算符,硬件设计基于单指令多数据(SIMD)架构。展示了我们的 SoC 扩展方案的可扩展性和系统性能。VLIW 用于并行执行多个指令。因此,卷积层的数据访问时间的所有开销都被消除了。最后,该处理器基于 TSMC 65nm 技术进行综合,时钟频率为 200MHz,并在实验模型中分析了 SoC 扩展方案。我们的设计在几个典型的神经网络上进行了测试,在 200MHz 时达到了 196GOPS,在 VGG16Net 和 AlexNet 上达到了 241GOPS/W。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/930b/9146143/fe4e2dc28376/sensors-22-03841-g015.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/930b/9146143/28f7cdbfa490/sensors-22-03841-g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/930b/9146143/b2b09d4f003d/sensors-22-03841-g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/930b/9146143/72f87bdc797c/sensors-22-03841-g003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/930b/9146143/8b41d871b5ae/sensors-22-03841-g004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/930b/9146143/d2e6c2fc3790/sensors-22-03841-g005.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/930b/9146143/61879827e829/sensors-22-03841-g006.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/930b/9146143/3893d54dfa20/sensors-22-03841-g007.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/930b/9146143/2ec5f6dfc230/sensors-22-03841-g008.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/930b/9146143/7a62f86f698c/sensors-22-03841-g009.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/930b/9146143/d418bc071c5f/sensors-22-03841-g010.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/930b/9146143/2fc780404dc4/sensors-22-03841-g011.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/930b/9146143/ccc3e3b84d36/sensors-22-03841-g012.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/930b/9146143/da3d5ba3581b/sensors-22-03841-g013.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/930b/9146143/4a98550169e6/sensors-22-03841-g014.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/930b/9146143/fe4e2dc28376/sensors-22-03841-g015.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/930b/9146143/28f7cdbfa490/sensors-22-03841-g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/930b/9146143/b2b09d4f003d/sensors-22-03841-g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/930b/9146143/72f87bdc797c/sensors-22-03841-g003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/930b/9146143/8b41d871b5ae/sensors-22-03841-g004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/930b/9146143/d2e6c2fc3790/sensors-22-03841-g005.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/930b/9146143/61879827e829/sensors-22-03841-g006.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/930b/9146143/3893d54dfa20/sensors-22-03841-g007.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/930b/9146143/2ec5f6dfc230/sensors-22-03841-g008.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/930b/9146143/7a62f86f698c/sensors-22-03841-g009.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/930b/9146143/d418bc071c5f/sensors-22-03841-g010.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/930b/9146143/2fc780404dc4/sensors-22-03841-g011.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/930b/9146143/ccc3e3b84d36/sensors-22-03841-g012.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/930b/9146143/da3d5ba3581b/sensors-22-03841-g013.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/930b/9146143/4a98550169e6/sensors-22-03841-g014.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/930b/9146143/fe4e2dc28376/sensors-22-03841-g015.jpg

相似文献

1
An ASIP for Neural Network Inference on Embedded Devices with 99% PE Utilization and 100% Memory Hidden under Low Silicon Cost.一款在低硅成本下,实现 99%PE 利用率和 100%内存隐藏的神经网络推理专用集成电路。
Sensors (Basel). 2022 May 19;22(10):3841. doi: 10.3390/s22103841.
2
A Heterogeneous RISC-V Processor for Efficient DNN Application in Smart Sensing System.一种用于智能传感系统中高效 DNN 应用的异构 RISC-V 处理器。
Sensors (Basel). 2021 Sep 28;21(19):6491. doi: 10.3390/s21196491.
3
An Energy-Efficient and Scalable Deep Learning/Inference Processor With Tetra-Parallel MIMD Architecture for Big Data Applications.一种具有四向并行 MIMD 架构的高能效可扩展深度学习/推理处理器,适用于大数据应用。
IEEE Trans Biomed Circuits Syst. 2015 Dec;9(6):838-48. doi: 10.1109/TBCAS.2015.2504563. Epub 2016 Jan 18.
4
Quantization Friendly MobileNet (QF-MobileNet) Architecture for Vision Based Applications on Embedded Platforms.面向嵌入式平台视觉应用的量化友好型 MobileNet(QF-MobileNet)架构。
Neural Netw. 2021 Apr;136:28-39. doi: 10.1016/j.neunet.2020.12.022. Epub 2020 Dec 29.
5
SCA: Search-Based Computing Hardware Architecture with Precision Scalable and Computation Reconfigurable Scheme.基于搜索的计算硬件架构,具有精确可扩展和计算可重构方案。
Sensors (Basel). 2022 Nov 6;22(21):8545. doi: 10.3390/s22218545.
6
A Heterogeneous Hardware Accelerator for Image Classification in Embedded Systems.面向嵌入式系统图像分类的异构硬件加速器。
Sensors (Basel). 2021 Apr 9;21(8):2637. doi: 10.3390/s21082637.
7
UArch: A Super-Resolution Processor With Heterogeneous Triple-Core Architecture for Workloads of U-Net Networks.UArch:一种具有异构三核架构的超分辨率处理器,适用于 U-Net 网络的工作负载。
IEEE Trans Biomed Circuits Syst. 2023 Jun;17(3):633-647. doi: 10.1109/TBCAS.2023.3261060. Epub 2023 Jul 12.
8
HybMED: A Hybrid Neural Network Training Processor With Multi-Sparsity Exploitation for Internet of Medical Things.HybMED:一种用于医疗物联网的具有多稀疏性利用的混合神经网络训练处理器。
IEEE Trans Biomed Circuits Syst. 2024 Oct;18(5):1178-1189. doi: 10.1109/TBCAS.2024.3389875. Epub 2024 Sep 26.
9
An Application Specific Instruction Set Processor (ASIP) for Adaptive Filters in Neural Prosthetics.用于神经假体中自适应滤波器的专用指令集处理器(ASIP)
IEEE/ACM Trans Comput Biol Bioinform. 2015 Sep-Oct;12(5):1034-47. doi: 10.1109/TCBB.2015.2440248.
10
Acceleration of Deep Neural Network Training Using Field Programmable Gate Arrays.使用现场可编程门阵列加速深度神经网络训练。
Comput Intell Neurosci. 2022 Oct 17;2022:8387364. doi: 10.1155/2022/8387364. eCollection 2022.

本文引用的文献

1
Machine Learning Prediction of TiO-Coating Wettability Tuned via UV Exposure.通过紫外线照射调节TiO涂层润湿性的机器学习预测
ACS Appl Mater Interfaces. 2021 Sep 29;13(38):46171-46179. doi: 10.1021/acsami.1c13262. Epub 2021 Sep 15.