Artificial Intelligence and Information Analysis Laboratory, Aristotle University of Thessaloniki, Greece.
Photonic Systems and Networks Research Group, Department of Informatics, Aristotle University of Thessaloniki, Greece.
Neural Netw. 2020 Sep;129:103-108. doi: 10.1016/j.neunet.2020.05.024. Epub 2020 Jun 1.
Photonics is among the most promising emerging technologies for providing fast and energy-efficient Deep Learning (DL) implementations. Despite their advantages, these photonic DL accelerators also come with certain important limitations. For example, the majority of existing photonic accelerators do not currently support many of the activation functions that are commonly used in DL, such as the ReLU activation function. Instead, sinusoidal and sigmoidal nonlinearities are usually employed, rendering the training process unstable and difficult to tune, mainly due to vanishing gradient phenomena. Thus, photonic DL models usually require carefully fine-tuning all their training hyper-parameters in order to ensure that the training process will proceed smoothly. Despite the recent advances in initialization schemes, as well as in optimization algorithms, training photonic DL models is still especially challenging. To overcome these limitations, we propose a novel adaptive initialization method that employs auxiliary tasks to estimate the optimal initialization variance for each layer of a network. The effectiveness of the proposed approach is demonstrated using two different datasets, as well as two recently proposed photonic activation functions and three different initialization methods. Apart from significantly increasing the stability of the training process, the proposed method can be directly used with any photonic activation function, without further requiring any other kind of fine-tuning, as also demonstrated through the conducted experiments.
光子学是最有前途的新兴技术之一,可用于提供快速且节能的深度学习 (DL) 实现。尽管这些光子学 DL 加速器具有优势,但它们也存在一些重要的局限性。例如,大多数现有的光子学加速器目前不支持 DL 中常用的许多激活函数,例如 ReLU 激活函数。相反,通常使用正弦和双曲非线性函数,这使得训练过程不稳定且难以调整,主要是由于梯度消失现象。因此,光子学 DL 模型通常需要仔细调整所有训练超参数,以确保训练过程能够顺利进行。尽管最近在初始化方案以及优化算法方面取得了进展,但训练光子学 DL 模型仍然极具挑战性。为了克服这些限制,我们提出了一种新颖的自适应初始化方法,该方法使用辅助任务来估计网络中每一层的最佳初始化方差。使用两个不同的数据集以及两个最近提出的光子激活函数和三种不同的初始化方法来证明所提出方法的有效性。除了显著提高训练过程的稳定性外,所提出的方法还可以直接与任何光子激活函数一起使用,而无需进一步进行任何其他类型的微调,通过进行的实验也证明了这一点。