Computational Biology and Bioinformatics Group (CBBG), Department of Biosciences, COMSATS University Islamabad, Park Road Islamabad, Islamabad, Pakistan.
Pakistan Agriculture Research Council, Islamabad, Pakistan.
Sci Rep. 2024 Sep 6;14(1):20840. doi: 10.1038/s41598-024-69721-9.
Breast cancer (BC) is a malignant neoplasm which is classified into various types defined by underlying molecular factors such as estrogen receptor positive (ER+), progesterone receptor positive (PR+), human epidermal growth factor positive (HER2+) and triple negative (TNBC). Early detection of ER+ and TNBC is crucial in the choice of diagnosis and appropriate treatment strategy. Here we report the key genes associated to ER+ and TNBC using RNA-Seq analysis and machine learning models. Three ER+ and TNBC RNA seq datasets comprising 164 patients in-toto were selected for standard NGS hierarchical data processing and data analyses protocols. Enrichment pathway analysis and network analysis was done and finally top hub genes were identified. To come with a reliable classifier which could distinguish the distinct transcriptome patterns associated to ER+ and TNBC, ML models were built employing Naïve Bayes, SVM and kNN. 1730 common DEG's exhibiting significant logFC values with 0.05 p-value threshold were identified. A list of top ten hub genes were screened on the basis of maximal clique centrality (MCC) which included CDC20, CDK1, BUB1, AURKA, CDCA8, RRM2, TTK, CENPF, CEP55 and NDC80.These genes were found to be involved in crucial cell cycle pathways. k-Nearest Neighbor (kNN) model was observed to be best classifier with accuracy 84%, specificity 66% and sensitivity 95% to differentiate between ER+ and TNBC RNA-Seq transcriptomes. Our screened list of 10 hub genes can thus help unearth novel molecular signatures implicated in ER+ and TNBC onset, prognosis and design of novel protocols for breast cancer diagnostics and therapeutics.
乳腺癌(BC)是一种恶性肿瘤,根据潜在的分子因素分为不同类型,如雌激素受体阳性(ER+)、孕激素受体阳性(PR+)、人表皮生长因子阳性(HER2+)和三阴性(TNBC)。早期发现 ER+和 TNBC 对于诊断和选择合适的治疗策略至关重要。在这里,我们使用 RNA-Seq 分析和机器学习模型报告与 ER+和 TNBC 相关的关键基因。选择了三个 ER+和 TNBC RNA seq 数据集,共包含 164 名患者,用于标准 NGS 分层数据处理和数据分析协议。进行了富集途径分析和网络分析,最终确定了顶级枢纽基因。为了构建一个能够区分与 ER+和 TNBC 相关的不同转录组模式的可靠分类器,我们构建了 ML 模型,包括朴素贝叶斯、SVM 和 kNN。鉴定出 1730 个具有 0.05 p 值阈值的显著 logFC 值的常见差异表达基因。根据最大团中心度(MCC)筛选出前 10 个枢纽基因,包括 CDC20、CDK1、BUB1、AURKA、CDCA8、RRM2、TTK、CENPF、CEP55 和 NDC80。这些基因被发现参与了关键的细胞周期途径。k-Nearest Neighbor (kNN) 模型被观察到是最佳分类器,其准确率为 84%,特异性为 66%,敏感性为 95%,可区分 ER+和 TNBC RNA-Seq 转录组。因此,我们筛选出的 10 个枢纽基因列表可以帮助发现与 ER+和 TNBC 发生、预后相关的新分子特征,并为乳腺癌诊断和治疗设计新方案。
Cancer Immunol Immunother. 2024-5-7
Comput Math Methods Med. 2021
Breast Cancer Res Treat. 2018-8-20
Open Life Sci. 2019-12-31
J Pers Med. 2021-1-20
Nucleic Acids Res. 2021-1-8
Asian Pac J Cancer Prev. 2020-4-1
Cancer Cell Int. 2019-10-11