Suppr超能文献

CHEF:一种用于在具有异构FPGA的集群上部署异构模型的框架。

CHEF: A Framework for Deploying Heterogeneous Models on Clusters with Heterogeneous FPGAs.

作者信息

Tang Yue, Song Yukai, Elango Naveena, Priya Sheena Ratnam, Jones Alex K, Xiong Jinjun, Zhou Peipei, Hu Jingtong

机构信息

Department of Electrical and Computer Engineering, University of Pittsburgh, Pittsburgh, PA 15261, USA.

Department of Computer Science and Engineering, University at Buffalo, Buffalo, NY 14260, USA.

出版信息

IEEE Trans Comput Aided Des Integr Circuits Syst. 2024 Nov;43(11):3937-3948. doi: 10.1109/tcad.2024.3438994. Epub 2024 Nov 6.

Abstract

DNNs are rapidly evolving from streamlined single-modality single-task (SMST) to multi-modality multi-task (MMMT) with large variations for different layers and complex data dependencies among layers. To support such models, hardware systems also evolved to be heterogeneous. The heterogeneous system comes from the prevailing trend to integrate diverse accelerators into the system for lower latency. FPGAs have high computation density and communication bandwidth and are configurable to be deployed with different designs of accelerators, which are widely used for various machine-learning applications. However, scaling from SMST to MMMT on heterogeneous FPGAs is challenging since MMMT has much larger layer variations, a massive number of layers, and complex data dependency among different backbones. Previous mapping algorithms are either inefficient or over-simplified which makes them impractical in general scenarios. In this work, we propose CHEF to enable efficient implementation of MMMT models in realistic heterogeneous FPGA clusters, i.e. deploying heterogeneous accelerators on heterogeneous FPGAs (A2F) and mapping the heterogeneous DNNs on the deployed heterogeneous accelerators (M2A). We propose CHEF-A2F, a two-stage accelerators-to-FPGAs deployment approach to co-optimize hardware deployment and accelerator mapping. In addition, we propose CHEF-M2A, which can support general and practical cases compared to previous mapping algorithms. To the best of our knowledge, this is the first attempt to implement MMMT models in real heterogeneous FPGA clusters. Experimental results show that the latency obtained with CHEF is near-optimal while the search time is 10000X less than exhaustively searching the optimal solution.

摘要

深度神经网络(DNN)正在迅速从简化的单模态单任务(SMST)发展到多模态多任务(MMMT),不同层之间存在很大差异,并且层与层之间存在复杂的数据依赖关系。为了支持此类模型,硬件系统也发展为异构系统。异构系统源于将各种加速器集成到系统中以降低延迟的普遍趋势。现场可编程门阵列(FPGA)具有高计算密度和通信带宽,并且可配置为与不同设计的加速器一起部署,广泛用于各种机器学习应用。然而,在异构FPGA上从SMST扩展到MMMT具有挑战性,因为MMMT具有更大的层差异、大量的层以及不同主干之间复杂的数据依赖性。以前的映射算法要么效率低下,要么过于简化,这使得它们在一般场景中不实用。在这项工作中,我们提出了CHEF,以在实际的异构FPGA集群中高效实现MMMT模型,即在异构FPGA上部署异构加速器(A2F),并将异构DNN映射到已部署的异构加速器上(M2A)。我们提出了CHEF-A2F,这是一种两阶段的加速器到FPGA的部署方法,用于共同优化硬件部署和加速器映射。此外,我们提出了CHEF-M2A,与以前的映射算法相比,它可以支持一般和实际情况。据我们所知,这是首次尝试在实际的异构FPGA集群中实现MMMT模型。实验结果表明,使用CHEF获得的延迟接近最优,而搜索时间比穷举搜索最优解少10000倍。

相似文献

1
CHEF: A Framework for Deploying Heterogeneous Models on Clusters with Heterogeneous FPGAs.CHEF:一种用于在具有异构FPGA的集群上部署异构模型的框架。
IEEE Trans Comput Aided Des Integr Circuits Syst. 2024 Nov;43(11):3937-3948. doi: 10.1109/tcad.2024.3438994. Epub 2024 Nov 6.
5
Toward Full-Stack Acceleration of Deep Convolutional Neural Networks on FPGAs.深度卷积神经网络在 FPGAs 上的全栈加速。
IEEE Trans Neural Netw Learn Syst. 2022 Aug;33(8):3974-3987. doi: 10.1109/TNNLS.2021.3055240. Epub 2022 Aug 3.

本文引用的文献

1
Extending High-Level Synthesis for Task-Parallel Programs.扩展任务并行程序的高级综合
Proc Annu IEEE Symp Field Program Cust Comput Mach. 2021 May;2021. doi: 10.1109/fccm51124.2021.00032. Epub 2021 Jun 2.

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验