Zhou Xinbing, Hao Peng, Liu Dake
School of Information and Communication Engineering, Hainan University, Haikou 570228, China.
School of Computer Science, Northwestern Polytechnical University, Xi'an 710072, China.
Micromachines (Basel). 2023 Feb 21;14(3):501. doi: 10.3390/mi14030501.
Hundreds of processor cores or modules are integrated into a single chip. The traditional bus or crossbar is challenged by bandwidth, scalability, and silicon area, and cannot meet the requirements of high end applications. Network-on-chip (NoC) has become a very promising interconnection structure because of its good scalability, predictable interconnect length and delay, high bandwidth, and reusability. However, the most available packet routing NoC may not be the perfect solution for high-end heterogeneous multi-core real-time systems-on-chip (SoC) because of the excessive latency and cache cost overhead. Moreover, circuit switching is limited by the scale, connectivity flexibility, and excessive overhead of fully connected systems. To solve the above problems and to meet the need for low latency, high throughput, and flexibility, this paper proposes PCCNoC (Packet Connected Circuit NoC), a low-latency and low-overhead NoC based on both packet switching (setting-up circuit) and circuit switching (data transmission on circuit), which offers flexible routing and zero overhead of data transmission latency, making it suitable for high-end heterogeneous multi-core real-time SoC at various system scales. Compared with typically available packet switched NoC, our PCCoC sees 242% improved performance and 97% latency reduction while keeping the silicon cost relatively low.
数百个处理器核心或模块被集成到一块芯片中。传统的总线或交叉开关在带宽、可扩展性和硅片面积方面面临挑战,无法满足高端应用的需求。片上网络(NoC)因其良好的可扩展性、可预测的互连长度和延迟、高带宽以及可重用性,已成为一种非常有前景的互连结构。然而,现有的大多数分组路由NoC可能并非高端异构多核片上实时系统(SoC)的完美解决方案,因为其存在过多的延迟和缓存成本开销。此外,电路交换受到全连接系统的规模、连接灵活性和过多开销的限制。为了解决上述问题并满足低延迟、高吞吐量和灵活性的需求,本文提出了PCCNoC(分组连接电路NoC),一种基于分组交换(建立电路)和电路交换(电路上的数据传输)的低延迟、低开销的NoC,它提供灵活的路由和零数据传输延迟开销,使其适用于各种系统规模的高端异构多核实时SoC。与现有的分组交换NoC相比,我们的PCCoC在保持相对较低的硅成本的同时,性能提高了242%,延迟降低了97%。