Oh Seokjin, An Jiyong, Min Kyeong-Sik
School of Electrical Engineering, Kookmin University, Seoul 02707, Republic of Korea.
Micromachines (Basel). 2023 Jan 25;14(2):309. doi: 10.3390/mi14020309.
Memristor crossbars can be very useful for realizing edge-intelligence hardware, because the neural networks implemented by memristor crossbars can save significantly more computing energy and layout area than the conventional CMOS (complementary metal-oxide-semiconductor) digital circuits. One of the important operations used in neural networks is convolution. For performing the convolution by memristor crossbars, the full image should be partitioned into several sub-images. By doing so, each sub-image convolution can be mapped to small-size unit crossbars, of which the size should be defined as 128 × 128 or 256 × 256 to avoid the line resistance problem caused from large-size crossbars. In this paper, various convolution schemes with 3D, 2D, and 1D kernels are analyzed and compared in terms of neural network's performance and overlapping overhead. The neural network's simulation indicates that the 2D + 1D kernels can perform the sub-image convolution using a much smaller number of unit crossbars with less rate loss than the 3D kernels. When the CIFAR-10 dataset is tested, the mapping of sub-image convolution of 2D + 1D kernels to crossbars shows that the number of unit crossbars can be reduced almost by 90% and 95%, respectively, for 128 × 128 and 256 × 256 crossbars, compared with the 3D kernels. On the contrary, the rate loss of 2D + 1D kernels can be less than 2%. To improve the neural network's performance more, the 2D + 1D kernels can be combined with 3D kernels in one neural network. When the normalized ratio of 2D + 1D layers is around 0.5, the neural network's performance indicates very little rate loss compared to when the normalized ratio of 2D + 1D layers is zero. However, the number of unit crossbars for the normalized ratio = 0.5 can be reduced by half compared with that for the normalized ratio = 0.
忆阻器交叉开关对于实现边缘智能硬件非常有用,因为由忆阻器交叉开关实现的神经网络比传统的互补金属氧化物半导体(CMOS)数字电路能显著节省更多的计算能量和布局面积。神经网络中使用的重要操作之一是卷积。为了通过忆阻器交叉开关执行卷积,应将完整图像划分为几个子图像。这样做的话,每个子图像卷积可以映射到小尺寸的单元交叉开关,其尺寸应定义为128×128或256×256,以避免大尺寸交叉开关引起的线路电阻问题。在本文中,从神经网络的性能和重叠开销方面分析并比较了具有3D、2D和1D内核的各种卷积方案。神经网络的模拟表明,与3D内核相比,2D + 1D内核可以使用数量少得多的单元交叉开关执行子图像卷积,且速率损失更小。在测试CIFAR - 10数据集时,2D + 1D内核的子图像卷积到交叉开关的映射表明,与3D内核相比,对于128×128和256×256交叉开关,单元交叉开关的数量分别可以减少近90%和95%。相反,2D + 1D内核的速率损失可以小于2%。为了进一步提高神经网络的性能,可以在一个神经网络中将2D + 1D内核与3D内核相结合。当2D + 1D层的归一化比率约为0.5时,与2D + 1D层的归一化比率为零时相比,神经网络的性能表明速率损失非常小。然而,归一化比率 = 0.5时的单元交叉开关数量与归一化比率 = 0时相比可以减少一半。