Chang Shao-Hsuan, Yeh Lung-Kun, Hung Kuo-Hsuan, Chiu Yen-Jung, Hsieh Chia-Hsun, Ma Chung-Pei
Department of Biomedical Engineering, Chang Gung University, Taoyuan 33302, Taiwan.
Department of Ophthalmology, Linkou Chang Gung Memorial Hospital, Taoyuan 33305, Taiwan.
Biomedicines. 2025 Apr 24;13(5):1032. doi: 10.3390/biomedicines13051032.
Keratoconus (KTCN) is a multifactorial disease characterized by progressive corneal degeneration. Recent studies suggest that a gene expression analysis of corneas may uncover potential novel biomarkers involved in corneal matrix remodeling. However, identifying reliable combinations of biomarkers that are linked to disease risk or progression remains a significant challenge. This study employed multiple machine learning algorithms to analyze the transcriptomes of keratoconus patients, identifying feature gene combinations and their functional associations, with the aim of enhancing the understanding of keratoconus pathogenesis. We analyzed the GSE77938 (PRJNA312169) dataset for differential gene expression (DGE) and performed gene set enrichment analysis (GSEA) using Kyoto Encyclopedia of Genes and Genomes (KEGG) pathways to identify enriched pathways in keratoconus (KTCN) versus controls. Machine learning algorithms were then used to analyze the gene sets, with SHapley Additive exPlanations (SHAP) applied to assess the contribution of key feature genes in the model's predictions. Selected feature genes were further analyzed through Gene Ontology (GO) enrichment to explore their roles in biological processes and cellular functions. Machine learning models, including XGBoost, Random Forest, Logistic Regression, and SVM, identified a set of important feature genes associated with keratoconus, with 15 notable genes appearing across multiple models, such as , , , , , , , , and others. The under-expressed genes in KTCN were involved in the mechanical resistance of the epidermis (, ) and in inflammation pathways (, , , , and ), as compared to controls. The GO analysis highlighted that the complex and its associated genes were primarily involved in biological processes related to the cytoskeleton organization, inflammation, and immune response. Furthermore, we expanded our analysis by incorporating additional datasets from PRJNA636666 and PRJNA1184491, thereby offering a broader representation of gene features and increasing the generalizability of our results across diverse cohorts. The differing gene sets identified by XGBoost and SVM may reflect distinct but complementary aspects of keratoconus pathophysiology. Meanwhile, XGBoost captured key immune and chemotactic regulators (e.g., , ), suggesting upstream inflammatory signaling pathways. SVM highlighted structural and epithelial differentiation markers (e.g., , ), possibly reflecting downstream tissue remodeling and stress responses. Our findings provide a novel research platform for the evaluation of keratoconus using machine learning-based approaches, offering valuable insights into its pathogenesis and potential therapeutic targets.
圆锥角膜(KTCN)是一种以进行性角膜变性为特征的多因素疾病。最近的研究表明,对角膜进行基因表达分析可能会发现参与角膜基质重塑的潜在新生物标志物。然而,确定与疾病风险或进展相关的可靠生物标志物组合仍然是一项重大挑战。本研究采用多种机器学习算法分析圆锥角膜患者的转录组,确定特征基因组合及其功能关联,旨在加深对圆锥角膜发病机制的理解。我们分析了GSE77938(PRJNA312169)数据集的差异基因表达(DGE),并使用京都基因与基因组百科全书(KEGG)通路进行基因集富集分析(GSEA),以确定圆锥角膜(KTCN)与对照组中富集的通路。然后使用机器学习算法分析基因集,并应用SHapley加性解释(SHAP)来评估关键特征基因在模型预测中的贡献。通过基因本体(GO)富集进一步分析选定的特征基因,以探索它们在生物过程和细胞功能中的作用。包括XGBoost、随机森林、逻辑回归和支持向量机在内的机器学习模型确定了一组与圆锥角膜相关的重要特征基因,有15个显著基因出现在多个模型中,如 、 、 、 、 、 、 、 等。与对照组相比,圆锥角膜中表达下调的基因参与表皮的机械抗性( 、 )和炎症通路( 、 、 、 、 )。GO分析突出显示, 复合体及其相关基因主要参与与细胞骨架组织、炎症和免疫反应相关的生物过程。此外,我们通过纳入来自PRJNA636666和PRJNA1184491的其他数据集扩展了分析,从而更广泛地展示了基因特征,并提高了我们结果在不同队列中的通用性。XGBoost和支持向量机确定的不同基因集可能反映了圆锥角膜病理生理学中不同但互补的方面。同时,XGBoost捕获了关键的免疫和趋化调节因子(如 、 ),提示上游炎症信号通路。支持向量机突出显示了结构和上皮分化标志物(如 、 ),可能反映了下游组织重塑和应激反应。我们的研究结果为使用基于机器学习的方法评估圆锥角膜提供了一个新的研究平台,为其发病机制和潜在治疗靶点提供了有价值的见解。