Suppr超能文献

通过先进的数据驱动方法改进乳腺癌生物标志物的发现和药物靶向。

Refining breast cancer biomarker discovery and drug targeting through an advanced data-driven approach.

机构信息

Industrial Engineering Department, Iran University of Science and Technology, Hengam Street, Tehran, 1684613114, Tehran, Iran.

Cancer Biology Research Center, Cancer Institute, Imam Khomeini Hospital Complex, Tehran University of Medical Sciences, Keshavarz Boulevard, Tehran, 1419733141, Tehran, Iran.

出版信息

BMC Bioinformatics. 2024 Jan 22;25(1):33. doi: 10.1186/s12859-024-05657-1.

Abstract

Breast cancer remains a major public health challenge worldwide. The identification of accurate biomarkers is critical for the early detection and effective treatment of breast cancer. This study utilizes an integrative machine learning approach to analyze breast cancer gene expression data for superior biomarker and drug target discovery. Gene expression datasets, obtained from the GEO database, were merged post-preprocessing. From the merged dataset, differential expression analysis between breast cancer and normal samples revealed 164 differentially expressed genes. Meanwhile, a separate gene expression dataset revealed 350 differentially expressed genes. Additionally, the BGWO_SA_Ens algorithm, integrating binary grey wolf optimization and simulated annealing with an ensemble classifier, was employed on gene expression datasets to identify predictive genes including TOP2A, AKR1C3, EZH2, MMP1, EDNRB, S100B, and SPP1. From over 10,000 genes, BGWO_SA_Ens identified 1404 in the merged dataset (F1 score: 0.981, PR-AUC: 0.998, ROC-AUC: 0.995) and 1710 in the GSE45827 dataset (F1 score: 0.965, PR-AUC: 0.986, ROC-AUC: 0.972). The intersection of DEGs and BGWO_SA_Ens selected genes revealed 35 superior genes that were consistently significant across methods. Enrichment analyses uncovered the involvement of these superior genes in key pathways such as AMPK, Adipocytokine, and PPAR signaling. Protein-protein interaction network analysis highlighted subnetworks and central nodes. Finally, a drug-gene interaction investigation revealed connections between superior genes and anticancer drugs. Collectively, the machine learning workflow identified a robust gene signature for breast cancer, illuminated their biological roles, interactions and therapeutic associations, and underscored the potential of computational approaches in biomarker discovery and precision oncology.

摘要

乳腺癌仍然是全球主要的公共卫生挑战。准确的生物标志物的识别对于乳腺癌的早期检测和有效治疗至关重要。本研究利用综合机器学习方法分析乳腺癌基因表达数据,以发现更好的生物标志物和药物靶点。从 GEO 数据库中获取基因表达数据集,并在预处理后进行合并。通过对合并数据集进行差异表达分析,发现乳腺癌与正常样本之间有 164 个差异表达基因。同时,另一个独立的基因表达数据集显示有 350 个差异表达基因。此外,我们还使用了一种集成了二进制灰狼优化和模拟退火与集成分类器的 BGWO_SA_Ens 算法,对基因表达数据集进行分析,以识别预测基因,包括 TOP2A、AKR1C3、EZH2、MMP1、EDNRB、S100B 和 SPP1。在超过 10000 个基因中,BGWO_SA_Ens 在合并数据集(F1 得分:0.981,PR-AUC:0.998,ROC-AUC:0.995)中识别出 1404 个基因,在 GSE45827 数据集(F1 得分:0.965,PR-AUC:0.986,ROC-AUC:0.972)中识别出 1710 个基因。差异表达基因和 BGWO_SA_Ens 选择基因的交集揭示了 35 个在各种方法中都具有一致性的优异基因。富集分析揭示了这些优异基因在 AMPK、脂肪细胞因子和 PPAR 信号等关键途径中的参与。蛋白质-蛋白质相互作用网络分析突出了子网和中心节点。最后,对药物-基因相互作用的研究揭示了优异基因与抗癌药物之间的联系。总的来说,机器学习工作流程为乳腺癌鉴定了一个稳健的基因特征,阐明了它们的生物学作用、相互作用和治疗相关性,并强调了计算方法在生物标志物发现和精准肿瘤学中的潜力。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d9a6/10810249/e7ee28b2cac6/12859_2024_5657_Figa_HTML.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验