The Blavatnik School of Computer Science, Tel Aviv University, Tel Aviv, Israel.
Department of Human Molecular Genetics and Biochemistry, Sackler School of Medicine, Tel Aviv University, Tel Aviv, Israel.
Mol Syst Biol. 2021 Jan;17(1):e9593. doi: 10.15252/msb.20209593.
Algorithms for active module identification (AMI) are central to analysis of omics data. Such algorithms receive a gene network and nodes' activity scores as input and report subnetworks that show significant over-representation of accrued activity signal ("active modules"), thus representing biological processes that presumably play key roles in the analyzed conditions. Here, we systematically evaluated six popular AMI methods on gene expression and GWAS data. We observed that GO terms enriched in modules detected on the real data were often also enriched on modules found on randomly permuted data. This indicated that AMI methods frequently report modules that are not specific to the biological context measured by the analyzed omics dataset. To tackle this bias, we designed a permutation-based method that empirically evaluates GO terms reported by AMI methods. We used the method to fashion five novel AMI performance criteria. Last, we developed DOMINO, a novel AMI algorithm, that outperformed the other six algorithms in extensive testing on GE and GWAS data. Software is available at https://github.com/Shamir-Lab.
主动模块识别 (AMI) 的算法是分析组学数据的核心。这些算法接收基因网络和节点的活性评分作为输入,并报告表现出显著过度累积活性信号的子网络(“活性模块”),从而代表在分析条件中可能发挥关键作用的生物过程。在这里,我们系统地评估了六种流行的 AMI 方法在基因表达和 GWAS 数据上的性能。我们观察到,在真实数据上检测到的模块中富集的 GO 术语通常也在随机排列数据上找到的模块中富集。这表明 AMI 方法经常报告与通过分析的组学数据集测量的生物背景不相关的模块。为了解决这个偏差,我们设计了一种基于排列的方法,该方法可以对 AMI 方法报告的 GO 术语进行经验评估。我们使用该方法制定了五个新的 AMI 性能标准。最后,我们开发了 DOMINO,这是一种新颖的 AMI 算法,在对 GE 和 GWAS 数据的广泛测试中表现优于其他六种算法。软件可在 https://github.com/Shamir-Lab 获得。