Guo Licheng, Chi Yuze, Wang Jie, Lau Jason, Qiao Weikang, Ustun Ecenur, Zhang Zhiru, Cong Jason
University of California, Los Angeles.
Cornell University.
FPGA. 2021 Feb;2021:81-92. doi: 10.1145/3431920.3439289.
Despite an increasing adoption of high-level synthesis (HLS) for its design productivity advantages, there remains a significant gap in the achievable frequency between an HLS design and a handcrafted RTL one. A key factor that limits the timing quality of the HLS outputs is the difficulty in accurately estimating the interconnect delay at the HLS level. This problem becomes even worse when large HLS designs are implemented on the latest multi-die FPGAs. To tackle this challenge, we propose AutoBridge, an automated framework that couples a coarse-grained floorplanning step with pipelining during HLS compilation. First, our approach provides HLS with a view on the global physical layout of the design, allowing HLS to more easily identify and pipeline the long wires, especially those crossing the die boundaries. Second, by exploiting the flexibility of HLS pipelining, the floorplanner is able to distribute the design logic across multiple dies on the FPGA device without degrading clock frequency. This prevents the placer from aggressively packing the logic on a single die which often results in local routing congestion that eventually degrades timing. Since pipelining may introduce additional latency, we further present analysis and algorithms to ensure the added latency will not compromise the overall throughput. AutoBridge can be integrated into the existing CAD toolflow for Xilinx FPGAs. In our experiments with a total of 43 design configurations, we improve the average frequency from 147 MHz to 297 MHz (a 102% improvement) with no loss of throughput and a negligible change in resource utilization. Notably, in 16 experiments we make the originally unroutable designs achieve 274 MHz on average. The tool is available at https://github.com/Licheng-Guo/AutoBridge.
尽管高级综合(HLS)因其在设计生产力方面的优势而被越来越多地采用,但HLS设计与手工编写的RTL设计在可实现的频率上仍存在显著差距。限制HLS输出时序质量的一个关键因素是在HLS级别准确估计互连延迟的难度。当在最新的多芯片FPGA上实现大型HLS设计时,这个问题会变得更加严重。为了应对这一挑战,我们提出了AutoBridge,这是一个自动化框架,在HLS编译期间将粗粒度布局规划步骤与流水线技术相结合。首先,我们的方法为HLS提供了设计的全局物理布局视图,使HLS能够更轻松地识别长连线并对其进行流水线处理,特别是那些跨越芯片边界的连线。其次,通过利用HLS流水线的灵活性,布局规划器能够将设计逻辑分布在FPGA器件的多个芯片上,而不会降低时钟频率。这可以防止布局器将逻辑过度堆积在单个芯片上,否则往往会导致局部布线拥塞,最终降低时序。由于流水线可能会引入额外的延迟,我们进一步提出了分析方法和算法,以确保增加的延迟不会影响整体吞吐量。AutoBridge可以集成到现有的用于赛灵思FPGA的CAD工具流程中。在我们总共43种设计配置的实验中,我们将平均频率从147 MHz提高到297 MHz(提高了102%),吞吐量没有损失,资源利用率的变化可以忽略不计。值得注意的是,在16次实验中,我们使原本无法布线的设计平均达到了274 MHz。该工具可在https://github.com/Licheng-Guo/AutoBridge上获取。