IEEE Trans Pattern Anal Mach Intell. 2023 Mar;45(3):2897-2912. doi: 10.1109/TPAMI.2022.3178914. Epub 2023 Feb 3.
Discovering hidden pattern from imbalanced data is a critical issue in various real-world applications. Existing classification methods usually suffer from the limitation of data especially for minority classes, and result in unstable prediction and low performance. In this paper, a deep generative classifier is proposed to mitigate this issue via both model perturbation and data perturbation. Specially, the proposed generative classifier is derived from a deep latent variable model where two variables are involved. One variable is to capture the essential information of the original data, denoted as latent codes, which are represented by a probability distribution rather than a single fixed value. The learnt distribution aims to enforce the uncertainty of model and implement model perturbation, thus, lead to stable predictions. The other variable is a prior to latent codes so that the codes are restricted to lie on components in Gaussian Mixture Model. As a confounder affecting generative processes of data (feature/label), the latent variables are supposed to capture the discriminative latent distribution and implement data perturbation. Extensive experiments have been conducted on widely-used real imbalanced image datasets. Experimental results demonstrate the superiority of our proposed model by comparing with popular imbalanced classification baselines on imbalance classification task.
从不平衡数据中发现隐藏模式是各种实际应用中的一个关键问题。现有的分类方法通常受到数据的限制,特别是对于少数类,这导致了不稳定的预测和低性能。在本文中,提出了一种深度生成分类器,通过模型扰动和数据扰动来缓解这个问题。特别地,所提出的生成分类器是从一个涉及两个变量的深度潜在变量模型中推导出来的。一个变量用于捕获原始数据的基本信息,称为潜在代码,这些代码由概率分布而不是单个固定值表示。学习到的分布旨在强制模型的不确定性并实现模型扰动,从而导致稳定的预测。另一个变量是潜在代码的先验,以便代码限制在高斯混合模型的分量上。由于潜在变量是影响数据生成过程(特征/标签)的混杂因素,因此应该捕获判别潜在分布并实现数据扰动。已经在广泛使用的真实不平衡图像数据集上进行了大量实验。实验结果通过与不平衡分类的流行基线在不平衡分类任务上进行比较,证明了我们提出的模型的优越性。