Ghosh Subhajit, Mandal Sukhen Das, Thakur Subarna
Department of Bioinformatics, University of North Bengal, Darjeeling, West Bengal, India.
Department of Computer Science and Engineering, Ghani Khan Choudhury Institute of Engineering and Technology (GKCIET), Malda, India.
Front Bioinform. 2025 Apr 17;5:1522401. doi: 10.3389/fbinf.2025.1522401. eCollection 2025.
The incidence of non-alcoholic fatty liver disease (NAFLD), encompassing the more severe non-alcoholic steatohepatitis (NASH), is rising alongside the surges in diabetes and obesity. Increasing evidence indicates that NASH is responsible for a significant share of idiopathic hepatocellular carcinoma (HCC) cases, a fatal cancer with a 5-year survival rate below 22%. Biomarkers can facilitate early screening and monitoring of at-risk NAFLD/NASH patients and assist in identifying potential drug candidates for treatment. This study utilized an ensemble feature selection framework to analyze transcriptomic data, identifying biomarker genes associated with the stage-wise progression of NAFLD-related HCC. Seven machine learning algorithms were assessed for disease stage classification. Twelve feature selection methods including correlation-based techniques, mutual information-based methods, and embedded techniques were utilized to rank the top genes as features, through this approach, multiple feature selection methods were combined to yield more robust features important in this disease progression. Cox regression-based survival analysis was carried out to evaluate the biomarker potentiality of these genes. Furthermore, multiphase drug repurposing strategy and molecular docking were employed to identify potential drug candidates against these biomarkers. Among the seven machine learning models initially evaluated, DISCR resulted as the most accurate disease stage classifier. Ensemble feature selection identified ten top genes, among which eight were recognized as potential biomarkers based on survival analysis. These include genes ABAT, ABCB11, MBTPS1, and ZFP1 mostly involved in alanine and glutamate metabolism, butanoate metabolism, and ER protein processing. Through drug repurposing, 81 candidate drugs were found to be effective against these markers genes, with Diosmin, Esculin, Lapatinib, and Phenelzine as the best candidates screened through molecular docking and MMGBSA. The consensus derived from multiple methods enhances the accuracy of identifying relevant robust biomarkers for NAFLD-associated HCC. The use of these biomarkers in a multiphase drug repurposing strategy highlights potential therapeutic options for early intervention, which is essential to stop disease progression and improve outcomes.
非酒精性脂肪性肝病(NAFLD)的发病率,包括更为严重的非酒精性脂肪性肝炎(NASH),正随着糖尿病和肥胖症的激增而上升。越来越多的证据表明,NASH在特发性肝细胞癌(HCC)病例中占很大比例,HCC是一种致命癌症,5年生存率低于22%。生物标志物有助于对有风险的NAFLD/NASH患者进行早期筛查和监测,并有助于识别潜在的治疗药物候选物。本研究利用一个集成特征选择框架来分析转录组数据,识别与NAFLD相关HCC的分期进展相关的生物标志物基因。评估了七种机器学习算法用于疾病阶段分类。利用包括基于相关性的技术、基于互信息的方法和嵌入式技术在内的十二种特征选择方法对顶级基因进行排序作为特征,通过这种方法,将多种特征选择方法结合起来,以产生在这种疾病进展中重要的更稳健的特征。进行基于Cox回归的生存分析以评估这些基因的生物标志物潜力。此外采用多阶段药物重新利用策略和分子对接来识别针对这些生物标志物的潜在药物候选物。在最初评估的七种机器学习模型中,DISCR是最准确的疾病阶段分类器。集成特征选择确定了十个顶级基因,其中八个基于生存分析被识别为潜在生物标志物。这些基因包括ABAT、ABCB11、MBTPS1和ZFP1,它们大多参与丙氨酸和谷氨酸代谢、丁酸代谢以及内质网蛋白加工。通过药物重新利用,发现81种候选药物对这些标记基因有效,其中地奥司明、七叶苷、拉帕替尼和苯乙肼是通过分子对接和MMGBSA筛选出的最佳候选药物。多种方法得出的共识提高了识别NAFLD相关HCC相关稳健生物标志物的准确性。在多阶段药物重新利用策略中使用这些生物标志物突出了早期干预的潜在治疗选择,这对于阻止疾病进展和改善预后至关重要。