New Technology Development Department, Institute of Microelectronics of the Chinese Academy of Sciences, Beijing 100029, China.
University of Chinese Academy of Sciences, Beijing 100029, China.
Sensors (Basel). 2022 Mar 23;22(7):2471. doi: 10.3390/s22072471.
Division is generally regarded as a low-frequency, high-latency operation in integer operations. Division is also the operation that stalls the processor pipeline most frequently. In order to improve the overall performance of embedded processors, a low-delay divider for embedded processors was designed. Based on the non-restoring algorithm, the divider uses a compound adder to execute addition and subtraction simultaneously and reduces the iteration path delay. By shifting the operands to align the most effective bits, the divider dynamically adjusts the number of iteration cycles to reduce the average number of cycles in the division process. The divider design was simulated by Modelsim and implemented on a FPGA board for verification. Synthesized in a Semiconductor Manufacturing International Corporation (SMIC) 65 nm Low Leakage process, the achieved frequency of the design was up to 500 MHz and the area cost was 5670.36 μm. Compared with other dividers, the proposed divider design can reduce the delay of single iteration by up to 45.3%, save the average number of iteration cycles by 20-50%, and save the area by 23.3-86.1%. Compared with other dividers implemented on FPGA, it saves LUTs by 36.47-59.6% and FFs by 67-84.28%, runs 2-6.36 times faster. Therefore, the proposed design is suitable for embedded processors that require low power consumption, low resource consumption, and high performance.
除法通常被认为是整数运算中的一种低频、高延迟操作。除法也是最频繁导致处理器流水线停顿的操作。为了提高嵌入式处理器的整体性能,设计了一种用于嵌入式处理器的低延迟除法器。该除法器基于非恢复算法,使用复合加法器同时执行加法和减法,并减少迭代路径延迟。通过移位操作数以对齐最有效的位,除法器动态调整迭代周期数,以减少除法过程中的平均周期数。该除法器设计已通过 Modelsim 进行模拟,并在 FPGA 板上进行了验证。在中芯国际(SMIC)65nm 低漏电工艺下综合,设计的最高频率可达 500MHz,面积开销为 5670.36μm。与其他除法器相比,所提出的除法器设计可以将单次迭代的延迟减少多达 45.3%,将迭代周期的平均数量减少 20-50%,并节省 23.3-86.1%的面积。与在 FPGA 上实现的其他除法器相比,它可以节省 36.47-59.6%的 LUT 和 67-84.28%的 FF,运行速度提高 2-6.36 倍。因此,所提出的设计适用于需要低功耗、低资源消耗和高性能的嵌入式处理器。