Department of Statistics and Actuarial Science, University of Waterloo, Waterloo, N2L 3G1, Canada.
Department of Statistics, Iowa State University, Ames, 50011, USA.
BMC Bioinformatics. 2018 Mar 27;19(1):107. doi: 10.1186/s12859-018-2106-5.
Testing predefined gene categories has become a common practice for scientists analyzing high throughput transcriptome data. A systematic way of testing gene categories leads to testing hundreds of null hypotheses that correspond to nodes in a directed acyclic graph. The relationships among gene categories induce logical restrictions among the corresponding null hypotheses. An existing fully Bayesian method is powerful but computationally demanding.
We develop a computationally efficient method based on a hidden Markov tree model (HMTM). Our method is several orders of magnitude faster than the existing fully Bayesian method. Through simulation and an expression quantitative trait loci study, we show that the HMTM method provides more powerful results than other existing methods that honor the logical restrictions.
The HMTM method provides an individual estimate of posterior probability of being differentially expressed for each gene set, which can be useful for result interpretation. The R package can be found on https://github.com/k22liang/HMTGO .
对于分析高通量转录组数据的科学家来说,测试预定义的基因类别已经成为一种常见做法。系统地测试基因类别会导致测试数百个与有向无环图节点相对应的零假设。基因类别之间的关系在相应的零假设之间诱导出逻辑约束。现有的完全贝叶斯方法功能强大但计算要求高。
我们开发了一种基于隐马尔可夫树模型(HMTM)的计算效率方法。我们的方法比现有的完全贝叶斯方法快几个数量级。通过模拟和表达数量性状基因座研究,我们表明 HMTM 方法提供了比其他尊重逻辑约束的现有方法更强大的结果。
HMTM 方法为每个基因集的差异表达提供了个体估计的后验概率,这对于结果解释可能很有用。R 包可在 https://github.com/k22liang/HMTGO 上找到。