Yuan Jianxu, Zhou Dalin, Yu Shengjie
Department of Surgery, The Second Affiliated Hospital of Chongqing Medical University, Chongqing Medical University, Chongqing, China.
Transl Androl Urol. 2025 Jun 30;14(6):1528-1541. doi: 10.21037/tau-2025-242. Epub 2025 Jun 26.
Prostate cancer (PCa), a common malignancy among men globally, requires the identification of biomarkers for early diagnosis and predicting progression. This study aimed to identify the key genes involved in the occurrence and development of PCa.
Leveraging data from the Gene Expression Omnibus (GEO) database, this study integrated multi-chip datasets, conducting differential expression analysis and enrichment analysis to pinpoint PCa-related genes. Subsequently, machine learning models were constructed using least absolute shrinkage and selection operator (LASSO) regression, support vector machine (SVM), and random forest (RF) methods. The optimal model was selected for further study and the contribution of related genes was explained using SHapley Additive exPlanations (SHAP) analysis. Furthermore, gene set enrichment analysis (GSEA) and immune cell infiltration analysis were utilized to uncover the underlying molecular mechanisms.
In this study, 222 differentially expressed genes (DEGs) were identified and found to be enriched in functions and pathways potentially associated with PCa. Using multiple machine learning models, eight PCa-related core genes (, , , , , , , and ) were identified. The most accurate RF model was selected for further study with SHAP analysis, which also revealed the contribution of the above genes. GSEA and immune cell infiltration analysis uncovered distinctions between PCa and normal tissues.
This study offered potential biomarkers and a theoretical basis for the diagnosis and treatment for PCa.
前列腺癌(PCa)是全球男性中常见的恶性肿瘤,需要鉴定用于早期诊断和预测疾病进展的生物标志物。本研究旨在鉴定参与PCa发生和发展的关键基因。
本研究利用基因表达综合数据库(GEO)的数据,整合多芯片数据集,进行差异表达分析和富集分析以确定与PCa相关的基因。随后,使用最小绝对收缩和选择算子(LASSO)回归、支持向量机(SVM)和随机森林(RF)方法构建机器学习模型。选择最优模型进行进一步研究,并使用SHapley加性解释(SHAP)分析解释相关基因的贡献。此外,利用基因集富集分析(GSEA)和免疫细胞浸润分析来揭示潜在的分子机制。
在本研究中,鉴定出222个差异表达基因(DEG),发现它们在可能与PCa相关的功能和途径中富集。使用多种机器学习模型,鉴定出八个与PCa相关的核心基因(、、、、、、和)。选择最准确的RF模型进行进一步的SHAP分析研究,该分析还揭示了上述基因的贡献。GSEA和免疫细胞浸润分析揭示了PCa与正常组织之间的差异。
本研究为PCa的诊断和治疗提供了潜在的生物标志物和理论基础。