Yang En-Hui, Amer Hossam, Jiang Yanbing
Department of Electrical and Computer Engineering, University of Waterloo, Waterloo, ON N2L 3G1, Canada.
Entropy (Basel). 2021 Jul 10;23(7):881. doi: 10.3390/e23070881.
The impact of JPEG compression on deep learning (DL) in image classification is revisited. Given an underlying deep neural network (DNN) pre-trained with pristine ImageNet images, it is demonstrated that, if, for any original image, one can select, among its many JPEG compressed versions including its original version, a suitable version as an input to the underlying DNN, then the classification accuracy of the underlying DNN can be improved significantly while the size in bits of the selected input is, on average, reduced dramatically in comparison with the original image. This is in contrast to the conventional understanding that JPEG compression generally degrades the classification accuracy of DL. Specifically, for each original image, consider its 10 JPEG compressed versions with their quality factor (QF) values from {100,90,80,70,60,50,40,30,20,10}. Under the assumption that the ground truth label of the original image is known at the time of selecting an input, but unknown to the underlying DNN, we present a selector called Highest Rank Selector (HRS). It is shown that HRS is optimal in the sense of achieving the highest Top accuracy on any set of images for any among all possible selectors. When the underlying DNN is Inception V3 or ResNet-50 V2, HRS improves, on average, the Top 1 classification accuracy and Top 5 classification accuracy on the whole ImageNet validation dataset by 5.6% and 1.9%, respectively, while reducing the input size in bits dramatically-the compression ratio (CR) between the size of the original images and the size of the selected input images by HRS is 8 for the whole ImageNet validation dataset. When the ground truth label of the original image is unknown at the time of selection, we further propose a new convolutional neural network (CNN) topology which is based on the underlying DNN and takes the original image and its 10 JPEG compressed versions as 11 parallel inputs. It is demonstrated that the proposed new CNN topology, even when partially trained, can consistently improve the Top 1 accuracy of Inception V3 and ResNet-50 V2 by approximately 0.4% and the Top 5 accuracy of Inception V3 and ResNet-50 V2 by 0.32% and 0.2%, respectively. Other selectors without the knowledge of the ground truth label of the original image are also presented. They maintain the Top 1 accuracy, the Top 5 accuracy, or the Top 1 and Top 5 accuracy of the underlying DNN, while achieving CRs of 8.8, 3.3, and 3.1, respectively.
重新审视了JPEG压缩对图像分类中深度学习(DL)的影响。给定一个使用原始ImageNet图像预训练的基础深度神经网络(DNN),结果表明,如果对于任何原始图像,能够在其包括原始版本在内的许多JPEG压缩版本中选择一个合适的版本作为基础DNN的输入,那么基础DNN的分类准确率可以显著提高,同时与原始图像相比,所选输入的比特大小平均会大幅降低。这与传统观点中JPEG压缩通常会降低DL的分类准确率形成对比。具体而言,对于每个原始图像,考虑其10个JPEG压缩版本,其质量因子(QF)值分别为{100, 90, 80, 70, 60, 50, 40, 30, 20, 10}。假设在选择输入时已知原始图像的真实标签,但基础DNN未知,我们提出了一种称为最高排名选择器(HRS)的选择器。结果表明,在所有可能的选择器中,对于任何一组图像,HRS在实现最高Top 准确率方面是最优的。当基础DNN是Inception V3或ResNet - 50 V2时,HRS在整个ImageNet验证数据集上平均将Top 1分类准确率和Top 5分类准确率分别提高了5.6%和1.9%,同时大幅减小了输入的比特大小——对于整个ImageNet验证数据集,原始图像大小与HRS所选输入图像大小之间的压缩率(CR)为8。当在选择时原始图像的真实标签未知时,我们进一步提出了一种基于基础DNN的新卷积神经网络(CNN)拓扑结构,它将原始图像及其10个JPEG压缩版本作为11个并行输入。结果表明,所提出的新CNN拓扑结构即使在部分训练时,也能持续将Inception V3和ResNet - 50 V2的Top 1准确率分别提高约0.4%,将Inception V3和ResNet - 50 V2的Top 5准确率分别提高0.32%和0.2%。还提出了其他在不知道原始图像真实标签的情况下的选择器。它们分别保持了基础DNN的Top 1准确率、Top 5准确率或Top 1和Top 5准确率,同时实现的压缩率分别为8.8、3.3和3.1。