Hou Binjie, Chen Gang
Department of Mathematics, Dalian Maritime University, Dalian 116026, China.
Glasgow College, University of Electronic Science and Technology of China, Chengdu 611731, China.
Math Biosci Eng. 2024 Feb 26;21(3):4309-4327. doi: 10.3934/mbe.2024190.
Due to their high bias in favor of the majority class, traditional machine learning classifiers face a great challenge when there is a class imbalance in biological data. More recently, generative adversarial networks (GANs) have been applied to imbalanced data classification. For GANs, the distribution of the minority class data fed into discriminator is unknown. The input to the generator is random noise ($ z $) drawn from a standard normal distribution $ N(0, 1) $. This method inevitably increases the training difficulty of the network and reduces the quality of the data generated. In order to solve this problem, we proposed a new oversampling algorithm by combining the Bootstrap method and the Wasserstein GAN Network (BM-WGAN). In our approach, the input to the generator network is the data ($ z $) drawn from the distribution of minority class estimated by the BM. The generator was used to synthesize minority class data when the network training is completed. Through the above steps, the generator model can learn the useful features from the minority class and generate realistic-looking minority class samples. The experimental results indicate that BM-WGAN improves the classification performance greatly compared to other oversampling algorithms. The BM-WGAN implementation is available at: https://github.com/ithbjgit1/BMWGAN.git.
由于传统机器学习分类器高度偏向多数类,当生物数据存在类不平衡时,它们面临巨大挑战。最近,生成对抗网络(GAN)已被应用于不平衡数据分类。对于GAN,输入到判别器的少数类数据的分布是未知的。生成器的输入是从标准正态分布$N(0, 1)$中抽取的随机噪声($z$)。这种方法不可避免地增加了网络的训练难度,并降低了生成数据的质量。为了解决这个问题,我们通过结合自助法和瓦瑟斯坦GAN网络(BM-WGAN)提出了一种新的过采样算法。在我们的方法中,生成器网络的输入是从由BM估计的少数类分布中抽取的数据($z$)。当网络训练完成时,生成器用于合成少数类数据。通过上述步骤,生成器模型可以从少数类中学习有用特征并生成看起来逼真的少数类样本。实验结果表明,与其他过采样算法相比,BM-WGAN大大提高了分类性能。BM-WGAN的实现可在以下网址获取:https://github.com/ithbjgit1/BMWGAN.git。