Anjum Farah, Alsharif Abdulaziz, Bakhuraysah Maha, Shafie Alaa, Hassan Md Imtaiyaz, Mohammad Taj
Department of Clinical Laboratory Sciences, College of Applied Medical Sciences, Taif University, P.O. Box 11099, 21944, Taif, Saudi Arabia.
King Salman Center for Disability Research, Riyadh, 11614, Saudi Arabia.
J Mol Neurosci. 2025 Apr 30;75(2):61. doi: 10.1007/s12031-025-02340-9.
Amyotrophic lateral sclerosis (ALS) is a progressive neurodegenerative disorder that has multiple factors that make its molecular pathogenesis difficult to understand and its diagnosis and treatment during the early stages difficult to determine. Discovering novel biomarkers in ALS for diagnostic and therapeutic potential has become important. Consequently, bioinformatics and machine learning algorithms are useful for identifying differentially expressed genes (DEGs) and potential biomarkers, as well as understanding the molecular mechanisms and intricacies of diseases such as ALS. To achieve the aim of the present study, six datasets obtained from the Gene Expression Omnibus (GEO) were utilized and analyzed using an integrative bioinformatics and machine learning approach. Log transformation was done during data preprocessing, RMA normalization was performed, and the batch effect was corrected. Differential expression analysis identified 206 DEGs that were significantly associated with different biological processes, including muscle function, energy metabolism, and mitochondrial membrane activity. Functional enrichment analysis highlighted pathways, including those related to prion disease, Parkinson's disease, and ATP synthesis via chemiosmotic coupling. We employed a multi-step machine learning framework incorporating random forest, LASSO regression, and SVM-RFE to identify robust biomarkers. This approach identified three key genes, CHRNA1, DLG5, and PLA2G4C, which could be explored as promising biomarkers for ALS after further validation. The internal validation, including principal component analysis (PCA) and ROC-AUC analysis, demonstrated strong diagnostic potential of these hub genes, achieving an AUC of 0.96. This work highlights the utility of bioinformatics and machine learning in identifying key genes as biomarkers for diagnostic and therapeutic potential in ALS.
肌萎缩侧索硬化症(ALS)是一种进行性神经退行性疾病,其发病机制涉及多种因素,使得理解其分子发病机制以及早期诊断和治疗都具有挑战性。因此,发现ALS中具有诊断和治疗潜力的新型生物标志物变得至关重要。生物信息学和机器学习算法有助于识别差异表达基因(DEG)和潜在生物标志物,同时也有助于理解ALS等疾病的分子机制和复杂性。为实现本研究目标,我们利用并分析了从基因表达综合数据库(GEO)获取的六个数据集,采用了综合生物信息学和机器学习方法。数据预处理过程中进行了对数转换,执行了RMA标准化,并校正了批次效应。差异表达分析确定了206个与不同生物学过程显著相关的DEG,包括肌肉功能、能量代谢和线粒体膜活性。功能富集分析突出了一些通路,包括与朊病毒病、帕金森病以及通过化学渗透偶联进行ATP合成相关的通路。我们采用了一个包含随机森林、LASSO回归和支持向量机递归特征消除(SVM-RFE)的多步骤机器学习框架来识别可靠的生物标志物。该方法确定了三个关键基因,即CHRNA1、DLG5和PLA2G4C,经过进一步验证后有望作为ALS的生物标志物进行深入研究。包括主成分分析(PCA)和ROC-AUC分析在内的内部验证表明,这些核心基因具有很强的诊断潜力,AUC达到0.96。这项工作突出了生物信息学和机器学习在识别关键基因作为ALS诊断和治疗潜在生物标志物方面的实用性。