Appl Opt. 2022 Mar 20;61(9):2173-2183. doi: 10.1364/AO.435738.
In recent years, convolutional neural networks (CNNs) have enabled ubiquitous image processing applications. As such, CNNs require fast forward propagation runtime to process high-resolution visual streams in real time. This is still a challenging task even with state-of-the-art graphics and tensor processing units. The bottleneck in computational efficiency primarily occurs in the convolutional layers. Performing convolutions in the Fourier domain is a promising way to accelerate forward propagation since it transforms convolutions into elementwise multiplications, which are considerably faster to compute for large kernels. Furthermore, such computation could be implemented using an optical 4 system with orders of magnitude faster operation. However, a major challenge in using this spectral approach, as well as in an optical implementation of CNNs, is the inclusion of a nonlinearity between each convolutional layer, without which CNN performance drops dramatically. Here, we propose a spectral CNN linear counterpart (SCLC) network architecture and its optical implementation. We propose a hybrid platform with an optical front end to perform a large number of linear operations, followed by an electronic back end. The key contribution is to develop a knowledge distillation (KD) approach to circumvent the need for nonlinear layers between the convolutional layers and successfully train such networks. While the KD approach is known in machine learning as an effective process for network pruning, we adapt the approach to transfer the knowledge from a nonlinear network (teacher) to a linear counterpart (student), where we can exploit the inherent parallelism of light. We show that the KD approach can achieve performance that easily surpasses the standard linear version of a CNN and could approach the performance of the nonlinear network. Our simulations show that the possibility of increasing the resolution of the input image allows our proposed 4 optical linear network to perform more efficiently than a nonlinear network with the same accuracy on two fundamental image processing tasks: (i) object classification and (ii) semantic segmentation.
近年来,卷积神经网络(CNN)已经实现了无处不在的图像处理应用。因此,CNN 需要快速的正向传播运行时,以便实时处理高分辨率的视觉流。即使使用最先进的图形和张量处理单元,这仍然是一个具有挑战性的任务。计算效率的瓶颈主要发生在卷积层。在傅立叶域中进行卷积是一种很有前途的加速正向传播的方法,因为它将卷积转换为元素乘法,对于大核来说,计算速度要快得多。此外,这种计算可以使用具有数量级更快操作的光学 4 系统来实现。然而,使用这种谱方法以及 CNN 的光学实现面临的一个主要挑战是,在每个卷积层之间包含一个非线性,否则 CNN 的性能会急剧下降。在这里,我们提出了一种谱 CNN 线性对应物(SCLC)网络架构及其光学实现。我们提出了一种混合平台,具有光学前端来执行大量线性操作,然后是电子后端。关键贡献是开发一种知识蒸馏(KD)方法来规避在卷积层之间使用非线性层的需求,并成功训练这样的网络。虽然 KD 方法在机器学习中作为网络修剪的有效过程是已知的,但我们将该方法应用于从非线性网络(教师)到线性对应物(学生)转移知识,我们可以利用光的固有并行性。我们表明,KD 方法可以实现轻松超过 CNN 的标准线性版本的性能,并且可以接近非线性网络的性能。我们的模拟表明,增加输入图像分辨率的可能性使得我们提出的 4 光学线性网络在执行两项基本图像处理任务(i)对象分类和(ii)语义分割时比具有相同精度的非线性网络更有效。