Osabe Takayuki, Shimizu Kentaro, Kadota Koji
Graduate School of Agricultural and Life Sciences, The University of Tokyo, Tokyo, Japan.
Collaborative Research Institute for Innovative Microbiology, The University of Tokyo, Japan.
Bioinform Biol Insights. 2019 Jul 8;13:1177932219860817. doi: 10.1177/1177932219860817. eCollection 2019.
Empirical Bayes is a choice framework for differential expression (DE) analysis for multi-group RNA-seq count data. Its characteristic ability to compute posterior probabilities for predefined expression patterns allows users to assign the pattern with the highest value to the gene under consideration. However, current Bayesian methods such as baySeq and EBSeq can be improved, especially with respect to normalization. Two packages (baySeq and EBSeq) with their default normalization settings and with other normalization methods (MRN and TCC) were compared using three-group simulation data and real count data. Our findings were as follows: (1) the Bayesian methods coupled with TCC normalization performed comparably or better than those with the default normalization settings under various simulation scenarios, (2) default DE pipelines provided in TCC that implements a generalized linear model framework was still superior to the Bayesian methods with TCC normalization when overall degree of DE was evaluated, and (3) baySeq with TCC was robust against different choices of possible expression patterns. In practice, we recommend using the default DE pipeline provided in TCC for obtaining overall gene ranking and then using the baySeq with TCC normalization for assigning the most plausible expression patterns to individual genes.
经验贝叶斯是用于多组RNA测序计数数据差异表达(DE)分析的一种选择框架。它具有为预定义表达模式计算后验概率的独特能力,这使得用户能够为所考虑的基因分配具有最高值的模式。然而,当前的贝叶斯方法,如baySeq和EBSeq,仍有改进空间,尤其是在归一化方面。使用三组模拟数据和真实计数数据,对两个包(baySeq和EBSeq)及其默认归一化设置以及其他归一化方法(MRN和TCC)进行了比较。我们的研究结果如下:(1)在各种模拟场景下,结合TCC归一化的贝叶斯方法的表现与采用默认归一化设置的方法相当或更好;(2)在评估DE的总体程度时,TCC中实现广义线性模型框架的默认DE流程仍优于采用TCC归一化的贝叶斯方法;(3)采用TCC的baySeq对可能的表达模式的不同选择具有鲁棒性。在实际应用中,我们建议使用TCC中提供的默认DE流程来获得整体基因排名,然后使用采用TCC归一化的baySeq为单个基因分配最合理的表达模式。