Department of Biostatistics, University of Michigan, Ann Arbor, MI 48109, USA.
Center for Molecular Medicine and Genetics, Wayne State University, Detroit, MI 48201, USA.
Bioinformatics. 2020 Mar 1;36(6):1689-1695. doi: 10.1093/bioinformatics/btz831.
Gene set enrichment analysis has been shown to be effective in identifying relevant biological pathways underlying complex diseases. Existing approaches lack the ability to quantify the enrichment levels accurately, hence preventing the enrichment information to be further utilized in both upstream and downstream analyses. A modernized and rigorous approach for gene set enrichment analysis that emphasizes both hypothesis testing and enrichment estimation is much needed.
We propose a novel computational method, Bayesian Analysis of Gene Set Enrichment (BAGSE), for gene set enrichment analysis. BAGSE is built on a Bayesian hierarchical model and fully accounts for the uncertainty embedded in the association evidence of individual genes. We adopt an empirical Bayes inference framework to fit the proposed hierarchical model by implementing an efficient EM algorithm. Through simulation studies, we illustrate that BAGSE yields accurate enrichment quantification while achieving similar power as the state-of-the-art methods. Further simulation studies show that BAGSE can effectively utilize the enrichment information to improve the power in gene discovery. Finally, we demonstrate the application of BAGSE in analyzing real data from a differential expression experiment and a transcriptome-wide association study. Our results indicate that the proposed statistical framework is effective in aiding the discovery of potentially causal pathways and gene networks.
BAGSE is implemented using the C++ programing language and is freely available from https://github.com/xqwen/bagse/. Simulated and real data used in this paper are also available at the Github repository for reproducibility purposes.
Supplementary data are available at Bioinformatics online.
基因集富集分析已被证明是识别复杂疾病潜在相关生物学通路的有效方法。现有的方法缺乏准确量化富集水平的能力,因此无法进一步利用富集信息进行上下游分析。非常需要一种强调假设检验和富集估计的基因集富集分析的现代化和严格方法。
我们提出了一种新的计算方法,即贝叶斯基因集富集分析(BAGSE),用于基因集富集分析。BAGSE 建立在贝叶斯层次模型之上,并充分考虑了单个基因关联证据中嵌入的不确定性。我们采用经验贝叶斯推断框架,通过实现有效的 EM 算法来拟合所提出的层次模型。通过模拟研究,我们表明 BAGSE 可以在实现与最先进方法相似功效的同时,实现准确的富集量化。进一步的模拟研究表明,BAGSE 可以有效地利用富集信息来提高基因发现的功效。最后,我们展示了 BAGSE 在分析差异表达实验和全转录组关联研究中的真实数据的应用。我们的结果表明,所提出的统计框架有助于发现潜在的因果途径和基因网络。
BAGSE 是使用 C++编程语言实现的,并可从 https://github.com/xqwen/bagse/ 免费获得。本文中使用的模拟和真实数据也可在 Github 存储库中获得,以实现可重复性。
补充数据可在生物信息学在线获得。