Zakir Mahrukh, Saddiqa Alishbah, Sheikh Mawara, Zakir Lalarukh, Sami Fatima, Ahmad Faisal Sardar, Rauf Sadaf Abdul, Ali Iqra, Muneer Zahid, Alonazi Wadi B, Siddiqi Abdul Rauf
Department of Biosciences, COMSATS University, Park Road Islamabad, Islamabad, Pakistan.
Pakistan Agriculture Research Council Islamabad, Islamabad, Pakistan.
Sci Rep. 2025 Jul 2;15(1):23675. doi: 10.1038/s41598-025-94084-0.
Breast cancer is the most prevalent and lethal form of cancer being the utmost common medical concern of women. Breast cancer etiology implicates numerous cellular protein receptors such as estrogen receptors (ER), progesterone receptors (PR), and human epidermal growth factor/receptor 2 (HER2) which turn on oncogenic cascade often attributed to certain genetic variations. Breast Cancer is thus classified into ER + /-, PR + /-, HER2 ± and Triple Negative types. This study seeks to build upon our current knowledge of HER2 + and TNBC BC types to discover novel patterns for diagnosis and prognosis. The study exploits wealth of HER2 + and TNBC transcriptome (RNA Seq) data to elucidate the key hub genes, their associated networks, pathways, stage-wise expression profile, role in prognosis and survival expectancy, and regulatory transcription factors. The study also employs machine learning models including support vector machine (SVM), XGBoost, Random Forest, k nearest neighbor (kNN), Naïve Bayes and Voting Classifier to distinguish between HER2 + and TNBC transcriptomes which is a key variable for early detection and choice of therapeutic alternatives. RNA Seq datasets consisting of 49 HER2 + and 44 TNBC breast tumor samples were retrieved and pre-processed. Differentially Expressed Genes (DEGs) along with their logFC and p-values were fetched. The KEGG (Kyoto Encyclopedia of Genes and Genomes) and GO (Gene Ontology) analyses of DEGs were conducted on DAVID (the Database for Annotation, Visualization and Integrated Discovery) and interaction network was constructed through Cytoscape. Ten hub genes were obtained based on maximum clique centrality (MCC), maximum neighborhood component (MNC), degree, closeness and betweenness using cytoHubba which included ACTB, ATM, ESR1, GAPDH, HNRNPK, KRAS, MDM2, SIRT1, TP53, and H3F3C (H3-5). These hub genes were found to be associated with cell proliferation, invasion and migration. Transcription factors and association of the expression profile of these hub genes with survival expectancy was also determined. Among the ML models, SVM stood out, exhibiting classification success between HER2 + and TNBC transcriptomes with an accuracy of 90%. The findings of this study can therefore effectively aid in tracing the initial prognosis of BC and identify biomarkers for the personalized prevention, prediction, diagnosis, and treatment of BC.
乳腺癌是最常见且致命的癌症形式,是女性最为关注的医学问题。乳腺癌的病因涉及众多细胞蛋白受体,如雌激素受体(ER)、孕激素受体(PR)和人表皮生长因子/受体2(HER2),这些受体开启的致癌级联反应通常归因于某些基因变异。因此,乳腺癌被分为ER +/ -、PR +/ -、HER2 ±和三阴性类型。本研究旨在基于我们目前对HER2 +和三阴性乳腺癌(TNBC)类型的了解,发现新的诊断和预后模式。该研究利用丰富的HER2 +和TNBC转录组(RNA测序)数据,以阐明关键的枢纽基因、它们相关的网络、通路、分期表达谱、在预后和生存预期中的作用以及调控转录因子。该研究还采用了机器学习模型,包括支持向量机(SVM)、XGBoost、随机森林、k近邻(kNN)、朴素贝叶斯和投票分类器,以区分HER2 +和TNBC转录组,这是早期检测和选择治疗方案的关键变量。检索并预处理了由49个HER2 +和44个TNBC乳腺肿瘤样本组成的RNA测序数据集。获取了差异表达基因(DEG)及其logFC和p值。在DAVID(注释、可视化和综合发现数据库)上对DEG进行了KEGG(京都基因与基因组百科全书)和GO(基因本体论)分析,并通过Cytoscape构建了相互作用网络。使用cytoHubba基于最大团中心性(MCC)、最大邻域成分(MNC)、度、紧密性和中介性获得了10个枢纽基因,包括ACTB、ATM、ESR1、GAPDH、HNRNPK、KRAS、MDM2、SIRT1、TP53和H3F3C(H3 - 5)。发现这些枢纽基因与细胞增殖、侵袭和迁移有关。还确定了转录因子以及这些枢纽基因的表达谱与生存预期的关联。在机器学习模型中,SVM表现突出,在HER2 +和TNBC转录组之间的分类成功率达到90%。因此,本研究的结果可以有效地帮助追踪乳腺癌的初始预后,并识别用于乳腺癌个性化预防、预测诊断和治疗的生物标志物。