Deng Chenxun, Li Dafang, Ji Lin, Zhang Chengyang, Li Baican, Yan Hongying, Zheng Jiyuan, Wang Lifeng, Zhang Junguo
School of Technology, Beijing Forestry University, Beijing, 100083, PR China; Research Center for Biodiversity Intelligent Monitoring, Beijing Forestry University, Beijing, 100083, PR China; State Key Laboratory of Efficient Production of Forest Resources, Beijing Forestry University, Beijing, 100083, PR China.
School of Technology, Beijing Forestry University, Beijing, 100083, PR China; Research Center for Biodiversity Intelligent Monitoring, Beijing Forestry University, Beijing, 100083, PR China; State Key Laboratory of Efficient Production of Forest Resources, Beijing Forestry University, Beijing, 100083, PR China.
Neural Netw. 2025 Jan;181:106794. doi: 10.1016/j.neunet.2024.106794. Epub 2024 Oct 15.
Long-tailed data distributions have been a major challenge for the practical application of deep learning. Information augmentation intends to expand the long-tailed data into uniform distribution, which provides a feasible way to mitigate the data starvation of underrepresented classes. However, most existing augmentation methods face two significant challenges: (1) limited diversity in generated samples, and (2) the adverse effect of generated negative samples on downstream classification performance. In this paper, we propose a novel information augmentation method, named ChatDiff, to provide diverse positive samples for underrepresented classes, and eliminate generated negative samples. Specifically, we start with a prompt template to extract textual prior knowledge from the ChatGPT-3.5 model, enhancing the feature space for underrepresented classes. Then using this prior knowledge, a conditional diffusion model generates semantic-rich image samples for tail classes. Moreover, the proposed ChatDiff leverages a CLIP-based discriminator to screen and remove generated negative samples. This process avoids neural network learning the invalid or erroneous features, and further, improves long-tailed classification performance. Comprehensive experiments conducted on long-tailed benchmarks such as CIFAR10-LT, CIFAR100-LT, ImageNet-LT, and iNaturalist 2018, validate the effectiveness of our ChatDiff method.
长尾数据分布一直是深度学习实际应用中的一大挑战。信息增强旨在将长尾数据扩展为均匀分布,这为缓解代表性不足类别的数据匮乏问题提供了一种可行的方法。然而,大多数现有的增强方法面临两个重大挑战:(1)生成样本的多样性有限,以及(2)生成的负样本对下游分类性能的不利影响。在本文中,我们提出了一种新颖的信息增强方法,名为ChatDiff,为代表性不足的类别提供多样化的正样本,并消除生成的负样本。具体而言,我们从一个提示模板开始,从ChatGPT-3.5模型中提取文本先验知识,增强代表性不足类别的特征空间。然后利用这些先验知识,一个条件扩散模型为尾部类别生成语义丰富的图像样本。此外,所提出的ChatDiff利用基于CLIP的鉴别器来筛选和去除生成的负样本。这一过程避免了神经网络学习无效或错误的特征,进而提高了长尾分类性能。在CIFAR10-LT、CIFAR100-LT、ImageNet-LT和iNaturalist 2018等长尾基准上进行的综合实验,验证了我们的ChatDiff方法的有效性。