Zhan Siyu, Yu Hao, Liu Shuang, Qin Ke, Guo Lu
Institute of Intelligent Computing, University of Electronic Science and Technology of China, Chengdu, Sichuan, China.
Trusted Cloud Computing and Big Data Key Laboratory of Sichuan Province, Chengdu, Sichuan, China.
Front Genet. 2025 Apr 30;16:1583081. doi: 10.3389/fgene.2025.1583081. eCollection 2025.
Gene expression analysis plays a critical role in lung cancer research, offering molecular feature-based diagnostic insights that are particularly effective in distinguishing lung cancer subtypes. However, the high dimensionality and inherent imbalance of gene expression data create significant challenges for accurate diagnosis. This study aims to address these challenges by proposing an innovative deep learning-based method for predicting lung cancer subtypes.
We propose a method called Exo-LCClassifier, which integrates feature selection, one-dimensional convolutional neural networks (1D CNN), and an improved Wasserstein Generative Adversarial Network (WGAN). First, differential gene expression analysis was performed using DESeq2 to identify significantly expressed genes from both normal and tumor tissues. Next, the enhanced WGAN was applied to augment the dataset, addressing the issue of sample imbalance and increasing the diversity of effective samples. Finally, a 1D CNN was used to classify the balanced dataset, thereby improving the model's diagnostic accuracy.
The proposed method was evaluated using five-fold cross-validation, achieving an average accuracy of 0.9766 ± 0.0070, precision of 0.9762 ± 0.0101, recall of 0.9827 ± 0.0050, and F1-score of 0.9793 ± 0.0068. On an external GEO lung cancer dataset, it also showed strong performance with an accuracy of 0.9588, precision of 0.9558, recall of 0.9678, and F1-score of 0.9616.
This study addresses the critical challenge of imbalanced learning in lung cancer gene expression analysis through an innovative computational framework. Our solution integrates three advanced techniques: (1) DESeq2 for differential expression analysis, (2) WGAN for data augmentation, and (3) 1D CNN for feature learning and classification. The source codes are publicly available at: https://github.com/lanlinxxs/Exo-classifier.
基因表达分析在肺癌研究中起着关键作用,提供基于分子特征的诊断见解,这在区分肺癌亚型方面特别有效。然而,基因表达数据的高维度和内在不平衡给准确诊断带来了重大挑战。本研究旨在通过提出一种基于深度学习的创新方法来预测肺癌亚型,以应对这些挑战。
我们提出了一种名为Exo-LCClassifier的方法,该方法集成了特征选择、一维卷积神经网络(1D CNN)和改进的瓦瑟斯坦生成对抗网络(WGAN)。首先,使用DESeq2进行差异基因表达分析,以从正常组织和肿瘤组织中识别出显著表达的基因。接下来,应用增强的WGAN来扩充数据集,解决样本不平衡问题并增加有效样本的多样性。最后,使用1D CNN对平衡后的数据集进行分类,从而提高模型的诊断准确性。
使用五折交叉验证对所提出的方法进行评估,平均准确率达到0.9766±0.0070,精确率为0.9762±0.0101,召回率为0.9827±0.0050,F1分数为0.9793±0.0068。在外部GEO肺癌数据集上,它也表现出强大的性能,准确率为0.9588,精确率为0.9558,召回率为0.9678,F1分数为0.9616。
本研究通过一个创新的计算框架解决了肺癌基因表达分析中不平衡学习的关键挑战。我们的解决方案集成了三种先进技术:(1)用于差异表达分析的DESeq2,(2)用于数据扩充的WGAN,以及(3)用于特征学习和分类的1D CNN。源代码可在以下网址公开获取:https://github.com/lanlinxxs/Exo-classifier。