一种用于从高维、小样本和不平衡基因微阵列数据中识别癌症类型的多分类深度神经网络。

A multi-classification deep neural network for cancer type identification from high-dimension, small-sample and imbalanced gene microarray data.

作者信息

Zeng Yifu, Zhang Yixiang, Xiao Zikai, Sui He

机构信息

Cyberspace Institute of Advanced Technology, Guangzhou University, Guangzhou, China.

Department of Information Technology, The Second Affiliated Hospital of Fujian Medical University, Quanzhou, China.

出版信息

Sci Rep. 2025 Feb 12;15(1):5239. doi: 10.1038/s41598-025-89475-2.

DOI:10.1038/s41598-025-89475-2

PMID:39939378

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC11822135/

Abstract

Gene microarray technology provides an efficient way to diagnose cancer. However, microarray gene expression data face the challenges of high-dimension, small-sample, and multi-class imbalance. The coupling of these challenges leads to inaccurate results when using traditional feature selection and classification algorithms. Due to fast learning speed and good classification performance, deep neural network such as generative adversarial network has been proven one of the best classification algorithms, especially in bioinformatics domain. However, it is limited to binary application and inefficient in processing high-dimensional sparse features. This paper proposes a multi-classification generative adversarial network model combined with features bundling (MGAN-FB) to handle the coupling of high-dimension, small-sample, and multi-class imbalance for gene microarray data classification at both feature and algorithmic levels. At feature level, a deep encoder structure combining feature bundling (FB) mechanism and squeeze and excite (SE) mechanism, is designed for the generator. So, the sparsity, correlation and consequence of high-dimension features are all taken into consideration for adaptive features extraction. It achieves effective dimensionality reduction without transitional information loss. At algorithmic level, a softmax module coupled with multi-classifier are introduced into the discriminator, with a new objective function is distinctively designed for the proposed MGAN-FB model, considering encode loss, reconstruction loss, discrimination loss and multi-classification loss. We extend generative adversaria framework from the binary classification to the multi-classification field. Experiments are performed on eight open-source gene microarray datasets from classification performance, running time and non-parametric tests, which demonstrate that the proposed method has obvious advantages over other 7 compared methods.

摘要

基因微阵列技术为癌症诊断提供了一种有效的方法。然而，微阵列基因表达数据面临着高维、小样本和多类不平衡的挑战。这些挑战相互交织，导致在使用传统特征选择和分类算法时结果不准确。由于学习速度快和分类性能好，生成对抗网络等深度神经网络已被证明是最佳分类算法之一，尤其是在生物信息学领域。然而，它仅限于二分类应用，并且在处理高维稀疏特征时效率低下。本文提出了一种结合特征捆绑的多分类生成对抗网络模型（MGAN-FB），在特征和算法两个层面处理基因微阵列数据分类中的高维、小样本和多类不平衡的耦合问题。在特征层面，为生成器设计了一种结合特征捆绑（FB）机制和挤压与激励（SE）机制的深度编码器结构。这样，在进行自适应特征提取时，高维特征的稀疏性、相关性和重要性都得到了充分考虑。它在不损失过渡信息的情况下实现了有效的降维。在算法层面，在判别器中引入了一个与多分类器耦合的softmax模块，并为所提出的MGAN-FB模型专门设计了一个新的目标函数，考虑了编码损失、重构损失、判别损失和多分类损失。我们将生成对抗框架从二分类扩展到了多分类领域。在八个开源基因微阵列数据集上进行了实验，从分类性能、运行时间和非参数检验等方面进行评估，结果表明所提出的方法相对于其他七种比较方法具有明显优势。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/3415/11822135/61ba8e4dea85/41598_2025_89475_Fig1_HTML.jpg

相似文献

A multi-classification deep neural network for cancer type identification from high-dimension, small-sample and imbalanced gene microarray data.一种用于从高维、小样本和不平衡基因微阵列数据中识别癌症类型的多分类深度神经网络。

Sci Rep. 2025 Feb 12;15(1):5239. doi: 10.1038/s41598-025-89475-2.

A class imbalance-aware Relief algorithm for the classification of tumors using microarray gene expression data.一种基于类别不平衡感知的 Relief 算法，用于使用微阵列基因表达数据进行肿瘤分类。

Comput Biol Chem. 2019 Jun;80:121-127. doi: 10.1016/j.compbiolchem.2019.03.017. Epub 2019 Mar 24.

An Integrated Feature Selection Algorithm for Cancer Classification using Gene Expression Data.一种使用基因表达数据进行癌症分类的集成特征选择算法

Comb Chem High Throughput Screen. 2018;21(9):631-645. doi: 10.2174/1386207322666181220124756.

Gene Correlation Guided Gene Selection for Microarray Data Classification.基于基因相关性的基因选择在基因芯片数据分析分类中的应用。

Biomed Res Int. 2021 Aug 14;2021:6490118. doi: 10.1155/2021/6490118. eCollection 2021.

Gene expression data classification using locally linear discriminant embedding.基于局部线性判别嵌入的基因表达数据分类。

Comput Biol Med. 2010 Oct;40(10):802-10. doi: 10.1016/j.compbiomed.2010.08.003. Epub 2010 Sep 22.

Enhancing the prediction of IDC breast cancer staging from gene expression profiles using hybrid feature selection methods and deep learning architecture.使用混合特征选择方法和深度学习架构增强从基因表达谱预测浸润性导管癌乳腺癌分期的能力。

Med Biol Eng Comput. 2023 Nov;61(11):2895-2919. doi: 10.1007/s11517-023-02892-1. Epub 2023 Aug 2.

Ensemble feature selection for stable biomarker identification and cancer classification from microarray expression data.基于微阵列表达数据的稳定生物标志物识别和癌症分类的集成特征选择。

Comput Biol Med. 2022 Mar;142:105208. doi: 10.1016/j.compbiomed.2021.105208. Epub 2022 Jan 5.

Improving prediction accuracy of tumor classification by reusing genes discarded during gene selection.通过重新利用在基因选择过程中被丢弃的基因来提高肿瘤分类的预测准确性。

BMC Genomics. 2008;9 Suppl 1(Suppl 1):S3. doi: 10.1186/1471-2164-9-S1-S3.

Kernel-imbedded Gaussian processes for disease classification using microarray gene expression data.使用微阵列基因表达数据的用于疾病分类的核嵌入高斯过程。

BMC Bioinformatics. 2007 Feb 28;8:67. doi: 10.1186/1471-2105-8-67.

Dimension reduction with redundant gene elimination for tumor classification.用于肿瘤分类的冗余基因消除降维方法

BMC Bioinformatics. 2008 May 28;9 Suppl 6(Suppl 6):S8. doi: 10.1186/1471-2105-9-S6-S8.

本文引用的文献

Metaheuristic integrated machine learning classification of colon cancer using STFT LASSO and EHO feature extraction from microarray gene expressions.基于短时傅里叶变换（STFT）套索和从微阵列基因表达中提取的帝王蝶优化算法（EHO）特征的元启发式集成机器学习结肠癌分类法

Sci Rep. 2024 Jul 17;14(1):16485. doi: 10.1038/s41598-024-67135-1.

Machine Learning Methods for Cancer Classification Using Gene Expression Data: A Review.使用基因表达数据进行癌症分类的机器学习方法：综述

Bioengineering (Basel). 2023 Jan 28;10(2):173. doi: 10.3390/bioengineering10020173.

Prediction of tumor location in prostate cancer tissue using a machine learning system on gene expression data.基于基因表达数据的机器学习系统预测前列腺癌组织中的肿瘤位置。

BMC Bioinformatics. 2020 Mar 11;21(Suppl 2):78. doi: 10.1186/s12859-020-3345-9.

forgeNet: a graph deep neural network model using tree-based ensemble classifiers for feature graph construction.forgeNet：一种基于图的深度神经网络模型，使用基于树的集成分类器进行特征图构建。

Bioinformatics. 2020 Jun 1;36(11):3507-3515. doi: 10.1093/bioinformatics/btaa164.

Antimicrobial resistance genetic factor identification from whole-genome sequence data using deep feature selection.基于全基因组序列数据的深度特征选择进行抗菌药物耐药性遗传因子鉴定。

BMC Bioinformatics. 2019 Dec 24;20(Suppl 15):535. doi: 10.1186/s12859-019-3054-4.

A Review of Matched-pairs Feature Selection Methods for Gene Expression Data Analysis.基因表达数据分析中配对特征选择方法综述

Comput Struct Biotechnol J. 2018 Feb 25;16:88-97. doi: 10.1016/j.csbj.2018.02.005. eCollection 2018.

A graph-embedded deep feedforward network for disease outcome classification and feature selection using gene expression data.基于基因表达数据的疾病预后分类和特征选择的图嵌入深度前馈网络。

Bioinformatics. 2018 Nov 1;34(21):3727-3737. doi: 10.1093/bioinformatics/bty429.

Selecting Feature Subsets Based on SVM-RFE and the Overlapping Ratio with Applications in Bioinformatics.基于 SVM-RFE 和重叠率选择特征子集及其在生物信息学中的应用。

Molecules. 2017 Dec 26;23(1):52. doi: 10.3390/molecules23010052.

A deep learning-based multi-model ensemble method for cancer prediction.基于深度学习的癌症预测多模型集成方法。

Comput Methods Programs Biomed. 2018 Jan;153:1-9. doi: 10.1016/j.cmpb.2017.09.005. Epub 2017 Sep 14.

Feature selection methods for big data bioinformatics: A survey from the search perspective.大数据生物信息学中的特征选择方法：基于搜索视角的综述

Methods. 2016 Dec 1;111:21-31. doi: 10.1016/j.ymeth.2016.08.014. Epub 2016 Aug 31.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

一种用于从高维、小样本和不平衡基因微阵列数据中识别癌症类型的多分类深度神经网络。

A multi-classification deep neural network for cancer type identification from high-dimension, small-sample and imbalanced gene microarray data.

作者信息

机构信息

出版信息

相似文献

本文引用的文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献

本文引用的文献