Zhang Jingxuan, Chen Zhihua, Dai Lei, Li Ping, Sheng Bin
Department of Computer Science and Engineering, East China University of Science and Technology, Shanghai, 200237, China.
Department of Computing, The Hong Kong Polytechnic University, Hong Kong, China; School of Design, The Hong Kong Polytechnic University, Hong Kong, China.
Neural Netw. 2025 Nov;191:107819. doi: 10.1016/j.neunet.2025.107819. Epub 2025 Jul 5.
Dataset distillation (DD) aims to construct a smaller dataset compared to the original cumbersome one. Models trained on both datasets are expected to achieve almost the same accuracy on the test set. Previous work using a gradient matching (GM) framework achieved suboptimal performance because it only matched the gradient information for correct labels, neglecting to account for the model's surprise regarding incorrect answers. In this paper, we aim to produce more informative gradient information during the matching process and present a novel framework for DD by leveraging label cycle shifting. Specifically, it involves using pre-trained neural networks to process mismatched image-label pairs, resulting in the generation of diverse and substantial gradients during the backpropagation of the cross-entropy loss. Furthermore, GM with larger gradients tends to converge more rapidly compared to conventional GM approaches, prompting us to propose an early exit mechanism. To enhance the performance further, we employ an ensemble approach by applying an exponential moving average to the distilled dataset and introduce distribution matching to the total matching function. We demonstrate that the model implicitly considers gradient experiences from past rounds, and we have delved into mechanisms where gradient matching and distribution matching mutually enhance each other. Our design can outperform most previous DD methods with fewer training iterations. Experiments on the benchmark datasets (CIFAR10, CIFAR100, TinyImageNet, and a subset of ImageNet) present the effectiveness of our method.
数据集蒸馏(DD)旨在构建一个比原始庞大数据集更小的数据集。预计在这两个数据集上训练的模型在测试集上能达到几乎相同的准确率。先前使用梯度匹配(GM)框架的工作取得了次优性能,因为它只匹配了正确标签的梯度信息,而忽略了模型对错误答案的意外情况。在本文中,我们旨在在匹配过程中产生更多信息丰富的梯度信息,并通过利用标签循环移位提出一种新颖的DD框架。具体来说,它涉及使用预训练的神经网络来处理不匹配的图像 - 标签对,从而在交叉熵损失的反向传播过程中生成多样且大量的梯度。此外,与传统的GM方法相比,具有更大梯度的GM往往收敛得更快,这促使我们提出一种早期退出机制。为了进一步提高性能,我们采用一种集成方法,对蒸馏后的数据集应用指数移动平均,并将分布匹配引入到总匹配函数中。我们证明模型隐含地考虑了过去轮次的梯度经验,并且我们深入研究了梯度匹配和分布匹配相互增强的机制。我们的设计能够在更少的训练迭代次数下优于大多数先前的DD方法。在基准数据集(CIFAR10、CIFAR100、TinyImageNet和ImageNet的一个子集)上的实验展示了我们方法的有效性。