Wang Peisong, He Xiangyu, Cheng Jian
IEEE Trans Neural Netw Learn Syst. 2022 May 27;PP. doi: 10.1109/TNNLS.2022.3173498.
While binarized neural networks (BNNs) have attracted great interest, popular approaches proposed so far mainly exploit the symmetric sign function for feature binarization, i.e., to binarize activations into -1 and +1 with a fixed threshold of 0. However, whether this option is optimal has been largely overlooked. In this work, we propose the Sparsity-inducing BNN (Si-BNN) to quantize the activations to be either 0 or +1, which better approximates ReLU using 1-bit. We further introduce trainable thresholds into the backward function of binarization to guide the gradient propagation. Our method dramatically outperforms the current state-of-the-art, lowering the performance gap between full-precision networks and BNNs on mainstream architectures, achieving the new state-of-the-art on binarized AlexNet (Top-1 50.5%), ResNet-18 (Top-1 62.2%), and ResNet-50 (Top-1 68.3%). At inference time, Si-BNN still enjoys the high efficiency of bit-wise operations. In our implementation, the running time of binary AlexNet on the CPU can be competitive with the popular GPU-based deep learning framework.
虽然二值化神经网络(BNNs)已经引起了极大的关注,但迄今为止提出的流行方法主要利用对称符号函数进行特征二值化,即将激活值以固定阈值0二值化为-1和+1。然而,这种选择是否最优在很大程度上被忽视了。在这项工作中,我们提出了稀疏诱导二值化神经网络(Si-BNN),将激活值量化为0或+1,这能更好地用1位近似ReLU。我们还在二值化的反向函数中引入了可训练阈值,以指导梯度传播。我们的方法显著优于当前的最先进方法,缩小了主流架构上全精度网络和二值化神经网络之间的性能差距,在二值化的AlexNet(Top-1 50.5%)、ResNet-18(Top-1 62.2%)和ResNet-50(Top-1 68.3%)上达到了新的最先进水平。在推理时,Si-BNN仍然具有按位运算的高效率。在我们的实现中,二进制AlexNet在CPU上的运行时间可以与流行的基于GPU的深度学习框架相媲美。