Jia Dongfang, Chen Cheng, Chen Chen, Chen Fangfang, Zhang Ningrui, Yan Ziwei, Lv Xiaoyi
College of Information Science and Engineering, Xinjiang University, Urumqi, China.
Key Laboratory of Signal Detection and Processing, Xinjiang University, Urumqi, China.
Front Genet. 2021 May 17;12:628136. doi: 10.3389/fgene.2021.628136. eCollection 2021.
Mastering the molecular mechanism of breast cancer (BC) can provide an in-depth understanding of BC pathology. This study explored existing technologies for diagnosing BC, such as mammography, ultrasound, magnetic resonance imaging (MRI), computed tomography (CT), and positron emission tomography (PET) and summarized the disadvantages of the existing cancer diagnosis. The purpose of this article is to use gene expression profiles of The Cancer Genome Atlas (TCGA) and Gene Expression Omnibus (GEO) to classify BC samples and normal samples. The method proposed in this article triumphs over some of the shortcomings of traditional diagnostic methods and can conduct BC diagnosis more rapidly with high sensitivity and have no radiation. This study first selected the genes most relevant to cancer through weighted gene co-expression network analysis (WGCNA) and differential expression analysis (DEA). Then it used the protein-protein interaction (PPI) network to screen 23 hub genes. Finally, it used the support vector machine (SVM), decision tree (DT), Bayesian network (BN), artificial neural network (ANN), convolutional neural network CNN-LeNet and CNN-AlexNet to process the expression levels of 23 hub genes. For gene expression profiles, the ANN model has the best performance in the classification of cancer samples. The ten-time average accuracy is 97.36% (±0.34%), the F1 value is 0.8535 (±0.0260), the sensitivity is 98.32% (±0.32%), the specificity is 89.59% (±3.53%) and the AUC is 0.99. In summary, this method effectively classifies cancer samples and normal samples and provides reasonable new ideas for the early diagnosis of cancer in the future.
掌握乳腺癌(BC)的分子机制有助于深入了解BC病理学。本研究探讨了现有的BC诊断技术,如乳腺钼靶、超声、磁共振成像(MRI)、计算机断层扫描(CT)和正电子发射断层扫描(PET),并总结了现有癌症诊断方法的缺点。本文旨在利用癌症基因组图谱(TCGA)和基因表达综合数据库(GEO)的基因表达谱对BC样本和正常样本进行分类。本文提出的方法克服了传统诊断方法的一些缺点,能够以高灵敏度更快速地进行BC诊断且无辐射。本研究首先通过加权基因共表达网络分析(WGCNA)和差异表达分析(DEA)选择与癌症最相关的基因。然后利用蛋白质-蛋白质相互作用(PPI)网络筛选出23个核心基因。最后,使用支持向量机(SVM)、决策树(DT)、贝叶斯网络(BN)、人工神经网络(ANN)、卷积神经网络CNN-LeNet和CNN-AlexNet处理23个核心基因的表达水平。对于基因表达谱,ANN模型在癌症样本分类中表现最佳。十次平均准确率为97.36%(±0.34%),F1值为0.8535(±0.0260),灵敏度为98.32%(±0.32%),特异性为89.59%(±3.53%),AUC为0.99。总之,该方法有效地对癌症样本和正常样本进行了分类,为未来癌症的早期诊断提供了合理的新思路。