Di Yanming
Department of Statistics, Oregon State University, Corvallis, OR 97331, USA.
Stat Interface. 2015;8(4):405-418. doi: 10.4310/SII.2015.v8.n4.a1.
We consider negative binomial (NB) regression models for RNA-Seq read counts and investigate an approach where such NB regression models are fitted to individual genes separately and, in particular, the NB dispersion parameter is estimated from each gene separately without assuming commonalities between genes. This single-gene approach contrasts with the more widely-used dispersion-modeling approach where the NB dispersion is modeled as a simple function of the mean or other measures of read abundance, and then estimated from a large number of genes combined. We show that through the use of higher-order asymptotic techniques, inferences with correct type I errors can be made about the regression coefficients in a single-gene NB regression model even when the dispersion is unknown and the sample size is small. The motivations for studying single-gene models include: 1) they provide a basis of reference for understanding and quantifying the power-robustness trade-offs of the dispersion-modeling approach; 2) they can also be potentially useful in practice if moderate sample sizes become available and diagnostic tools indicate potential problems with simple models of dispersion.
我们考虑用于RNA测序读数计数的负二项式(NB)回归模型,并研究一种方法,即此类NB回归模型分别拟合到各个基因,特别是NB离散参数是从每个基因单独估计的,而不假设基因之间的共性。这种单基因方法与更广泛使用的离散建模方法形成对比,在后者中,NB离散被建模为均值或其他读数丰度度量的简单函数,然后从大量组合基因中估计。我们表明,通过使用高阶渐近技术,即使离散未知且样本量较小,也可以对单基因NB回归模型中的回归系数进行具有正确I型错误的推断。研究单基因模型的动机包括:1)它们为理解和量化离散建模方法的功效-稳健性权衡提供了参考基础;2)如果有适度的样本量且诊断工具表明简单离散模型存在潜在问题,它们在实践中也可能有用。