Park Heewon, Imoto Seiya, Miyano Satory
School of Mathematics, Statistics and Data Science, Sungshin Women's University, Seoul, Korea.
Data Science Center, Sungshin Women's University, Seoul, Korea.
Bioinformatics. 2025 Jul 1;41(7). doi: 10.1093/bioinformatics/btaf378.
Gene network analysis is essential for understanding the complex mechanisms underlying diseases, which often involve disruptions in molecular networks rather than individual genes. Despite the availability of large-scale omics datasets and computational tools for gene network analysis, interpretation of the biological relevance of these extensive networks remains challenging.
We propose a novel computational strategy, gene behaviors-based network enrichment analysis, which systematically identifies functional pathways enriched in phenotype-specific gene networks. Our novel method incorporates comprehensive network characteristics, i.e. gene expression levels, edge strengths, and structural patterns of edges, to rank genes based on activity and assess pathway enrichment, effectively identifying functional pathways enriched within these networks. Through simulation studies, our strategy demonstrated superior performance compared with that of existing methods in identifying enriched pathways. We applied this strategy to whole-blood RNA-seq data from 1102 COVID-19 samples provided by the Japan COVID-19 Task Force. The analysis revealed immune disease pathways enriched with COVID-19 severity-specific gene networks, including "Systemic lupus erythematosus" in asymptomatic and severe samples and "Inflammatory bowel disease," "Primary immunodeficiency," and "Rheumatoid arthritis" in mild samples. Key biomarkers of COVID-19, such as CXCL8, S100A9, and HLA class I genes, have been identified as critical hub genes and the main players within these networks.
Code is available in Figshare (https://doi.org/10.6084/m9.figshare.29093648.v3).
基因网络分析对于理解疾病背后的复杂机制至关重要,疾病往往涉及分子网络的破坏而非单个基因。尽管有大规模的组学数据集和用于基因网络分析的计算工具,但解释这些广泛网络的生物学相关性仍然具有挑战性。
我们提出了一种新的计算策略,即基于基因行为的网络富集分析,该策略系统地识别在表型特异性基因网络中富集的功能通路。我们的新方法纳入了全面的网络特征,即基因表达水平、边强度和边的结构模式,以根据活性对基因进行排名并评估通路富集,有效地识别这些网络中富集的功能通路。通过模拟研究,我们的策略在识别富集通路方面表现出优于现有方法的性能。我们将此策略应用于日本新冠疫情特别工作组提供的1102个新冠病毒病样本的全血RNA测序数据。分析揭示了免疫疾病通路在新冠病毒病严重程度特异性基因网络中富集,包括无症状和重症样本中的“系统性红斑狼疮”以及轻症样本中的“炎症性肠病”“原发性免疫缺陷”和“类风湿关节炎”。新冠病毒病的关键生物标志物,如CXCL8、S100A9和HLA I类基因,已被确定为这些网络中的关键枢纽基因和主要参与者。
代码可在Figshare上获取(https://doi.org/10.6084/m9.figshare.29093648.v3)。