Rusu Elena Cristina, Clavero-Mestres Helena, Sánchez-Álvarez Mario, Veciana-Molins Marina, Bertran Laia, Monfort-Lanzas Pablo, Aguilar Carmen, Camaron Javier, Auguet Teresa
GEMMAIR research Unit (AGAUR) - Applied Medicine (URV). Department of Medicine and Surgery. University Rovira I Virgili (URV), Health Research Institute Pere Virgili (IISPV), 43007, Tarragona, Spain; Institute for Integrative Systems Biology (I2SysBio), University of Valencia and the Spanish National Research Council (CSIC), 46980, Valencia, Spain.
GEMMAIR research Unit (AGAUR) - Applied Medicine (URV). Department of Medicine and Surgery. University Rovira I Virgili (URV), Health Research Institute Pere Virgili (IISPV), 43007, Tarragona, Spain.
Comput Biol Med. 2025 Jun;191:110170. doi: 10.1016/j.compbiomed.2025.110170. Epub 2025 Apr 12.
Metabolic-associated steatohepatitis (MASH), the progressive form of metabolic-associated steatotic liver disease (MASLD), poses significant risks for liver fibrosis and cardiovascular complications. Despite extensive research, reliable biomarkers for MASH diagnosis and progression remain elusive. This study aimed to identify hepatic transcriptomic and circulating proteomic signatures specific to MASH, and to develop a machine learning-based biomarker discovery model.
A systematic review of RNA-Seq and proteomic datasets was conducted, retrieving 7 hepatic transcriptomics and 3 circulating proteomics studies, encompassing 483 liver samples and 169 serum/plasma samples, respectively. Differential gene and protein expression analyses were performed, and pathways were enriched using gene set enrichment analysis. A machine learning (ML) model was developed to identify MASH-specific biomarkers, utilizing biologically significant protein ratios.
Hepatic transcriptomic analysis identified 5017 differentially expressed genes (DEGs), with significant enrichment of extracellular matrix (ECM) pathways. Serum proteomics revealed six differentially expressed proteins (DEPs), including complement-related proteins. Integration of transcriptomic and proteomic data highlighted the complement cascade as a key pathway in MASH, with discordant regulation between the liver and circulation. The ML-based biomarker discovery model, utilizing protein ratios, achieved an F1 scores of 0.83 and 0.64 in the training sets and 0.67 in an external validation set.
Our findings indicate ECM deregulation and complement system involvement in MASH progression. The novel ML model incorporating protein ratios offers a potential tool for MASH diagnosis. However, further refinement and validation across larger and more diverse cohorts is needed to generalize these results.
代谢相关脂肪性肝炎(MASH)是代谢相关脂肪性肝病(MASLD)的进展形式,对肝纤维化和心血管并发症构成重大风险。尽管进行了广泛研究,但用于MASH诊断和病情进展的可靠生物标志物仍然难以捉摸。本研究旨在识别MASH特有的肝脏转录组学和循环蛋白质组学特征,并开发基于机器学习的生物标志物发现模型。
对RNA测序和蛋白质组学数据集进行系统综述,检索到7项肝脏转录组学研究和3项循环蛋白质组学研究,分别涵盖483个肝脏样本和169个血清/血浆样本。进行差异基因和蛋白质表达分析,并使用基因集富集分析对通路进行富集。利用具有生物学意义的蛋白质比率,开发了一种机器学习(ML)模型来识别MASH特异性生物标志物。
肝脏转录组学分析确定了5017个差异表达基因(DEG),细胞外基质(ECM)通路显著富集。血清蛋白质组学揭示了6种差异表达蛋白(DEP),包括补体相关蛋白。转录组学和蛋白质组学数据的整合突出了补体级联反应是MASH中的关键通路,肝脏和循环之间存在不一致的调节。基于ML的生物标志物发现模型利用蛋白质比率,在训练集中的F1分数为0.83和0.64,在外部验证集中为0.67。
我们的研究结果表明ECM失调和补体系统参与MASH的进展。纳入蛋白质比率的新型ML模型为MASH诊断提供了一种潜在工具。然而,需要在更大且更多样化的队列中进行进一步优化和验证,以推广这些结果。