Mirza Zeenat, Ansari Md Shahid, Iqbal Md Shahid, Ahmad Nesar, Alganmi Nofe, Banjar Haneen, Al-Qahtani Mohammed H, Karim Sajjad
King Fahd Medical Research Center, King Abdulaziz University, Jeddah 21589, Saudi Arabia.
Department of Medical Laboratory Science, Faculty of Applied Medical Sciences, King Abdulaziz University, Jeddah 21589, Saudi Arabia.
Cancers (Basel). 2023 Jun 18;15(12):3237. doi: 10.3390/cancers15123237.
Breast cancer (BC) is one of the most common female cancers. Clinical and histopathological information is collectively used for diagnosis, but is often not precise. We applied machine learning (ML) methods to identify the valuable gene signature model based on differentially expressed genes (DEGs) for BC diagnosis and prognosis.
A cohort of 701 samples from 11 GEO BC microarray datasets was used for the identification of significant DEGs. Seven ML methods, including RFECV-LR, RFECV-SVM, LR-L1, SVC-L1, RF, and Extra-Trees were applied for gene reduction and the construction of a diagnostic model for cancer classification. Kaplan-Meier survival analysis was performed for prognostic signature construction. The potential biomarkers were confirmed via qRT-PCR and validated by another set of ML methods including GBDT, XGBoost, AdaBoost, KNN, and MLP.
We identified 355 DEGs and predicted BC-associated pathways, including kinetochore metaphase signaling, PTEN, senescence, and phagosome-formation pathways. A hub of 28 DEGs and a novel diagnostic nine-gene signature (, , , , , , and ) were identified using stringent filter conditions. Similarly, a novel prognostic model consisting of eight-gene signatures (, , , , , , , and ) was also identified using disease-free survival and overall survival analysis. Gene signatures were validated by another set of ML methods. Finally, qRT-PCR results confirmed the expression of the identified gene signatures in BC.
The ML approach helped construct novel diagnostic and prognostic models based on the expression profiling of BC. The identified nine-gene signature and eight-gene signatures showed excellent potential in BC diagnosis and prognosis, respectively.
乳腺癌(BC)是最常见的女性癌症之一。临床和组织病理学信息共同用于诊断,但往往不够精确。我们应用机器学习(ML)方法,基于差异表达基因(DEG)识别有价值的基因特征模型,用于乳腺癌的诊断和预后评估。
使用来自11个GEO乳腺癌微阵列数据集的701个样本队列,来识别显著的DEG。应用七种ML方法,包括基于递归特征消除和交叉验证的逻辑回归(RFECV-LR)、基于递归特征消除和交叉验证的支持向量机(RFECV-SVM)、L1正则化逻辑回归(LR-L1)、L1正则化支持向量分类(SVC-L1)、随机森林(RF)和极端随机树(Extra-Trees)进行基因筛选,并构建用于癌症分类的诊断模型。进行Kaplan-Meier生存分析以构建预后特征。通过qRT-PCR确认潜在生物标志物,并通过另一组ML方法(包括梯度提升决策树(GBDT)、极端梯度提升(XGBoost)、自适应增强(AdaBoost)、K近邻(KNN)和多层感知器(MLP))进行验证。
我们识别出355个DEG,并预测了与乳腺癌相关的通路,包括动粒中期信号通路、PTEN、衰老和吞噬体形成通路。使用严格的筛选条件,确定了一个由28个DEG组成的核心以及一个新的诊断性九基因特征(、、、、、、和)。同样,使用无病生存期和总生存期分析,还确定了一个由八个基因特征(、、、、、、、和)组成的新的预后模型。基因特征通过另一组ML方法进行验证。最后,qRT-PCR结果证实了所识别的基因特征在乳腺癌中的表达。
ML方法有助于基于乳腺癌的表达谱构建新的诊断和预后模型。所识别的九基因特征和八基因特征分别在乳腺癌诊断和预后方面显示出优异的潜力。