Zeng Yifu, Zhang Yixiang, Xiao Zikai, Sui He
Cyberspace Institute of Advanced Technology, Guangzhou University, Guangzhou, China.
Department of Information Technology, The Second Affiliated Hospital of Fujian Medical University, Quanzhou, China.
Sci Rep. 2025 Feb 12;15(1):5239. doi: 10.1038/s41598-025-89475-2.
Gene microarray technology provides an efficient way to diagnose cancer. However, microarray gene expression data face the challenges of high-dimension, small-sample, and multi-class imbalance. The coupling of these challenges leads to inaccurate results when using traditional feature selection and classification algorithms. Due to fast learning speed and good classification performance, deep neural network such as generative adversarial network has been proven one of the best classification algorithms, especially in bioinformatics domain. However, it is limited to binary application and inefficient in processing high-dimensional sparse features. This paper proposes a multi-classification generative adversarial network model combined with features bundling (MGAN-FB) to handle the coupling of high-dimension, small-sample, and multi-class imbalance for gene microarray data classification at both feature and algorithmic levels. At feature level, a deep encoder structure combining feature bundling (FB) mechanism and squeeze and excite (SE) mechanism, is designed for the generator. So, the sparsity, correlation and consequence of high-dimension features are all taken into consideration for adaptive features extraction. It achieves effective dimensionality reduction without transitional information loss. At algorithmic level, a softmax module coupled with multi-classifier are introduced into the discriminator, with a new objective function is distinctively designed for the proposed MGAN-FB model, considering encode loss, reconstruction loss, discrimination loss and multi-classification loss. We extend generative adversaria framework from the binary classification to the multi-classification field. Experiments are performed on eight open-source gene microarray datasets from classification performance, running time and non-parametric tests, which demonstrate that the proposed method has obvious advantages over other 7 compared methods.
基因微阵列技术为癌症诊断提供了一种有效的方法。然而,微阵列基因表达数据面临着高维、小样本和多类不平衡的挑战。这些挑战相互交织,导致在使用传统特征选择和分类算法时结果不准确。由于学习速度快和分类性能好,生成对抗网络等深度神经网络已被证明是最佳分类算法之一,尤其是在生物信息学领域。然而,它仅限于二分类应用,并且在处理高维稀疏特征时效率低下。本文提出了一种结合特征捆绑的多分类生成对抗网络模型(MGAN-FB),在特征和算法两个层面处理基因微阵列数据分类中的高维、小样本和多类不平衡的耦合问题。在特征层面,为生成器设计了一种结合特征捆绑(FB)机制和挤压与激励(SE)机制的深度编码器结构。这样,在进行自适应特征提取时,高维特征的稀疏性、相关性和重要性都得到了充分考虑。它在不损失过渡信息的情况下实现了有效的降维。在算法层面,在判别器中引入了一个与多分类器耦合的softmax模块,并为所提出的MGAN-FB模型专门设计了一个新的目标函数,考虑了编码损失、重构损失、判别损失和多分类损失。我们将生成对抗框架从二分类扩展到了多分类领域。在八个开源基因微阵列数据集上进行了实验,从分类性能、运行时间和非参数检验等方面进行评估,结果表明所提出的方法相对于其他七种比较方法具有明显优势。