Mi Gu, Di Yanming, Schafer Daniel W
Department of Statistics, Oregon State University, Corvallis, Oregon, United States of America.
Department of Statistics, Oregon State University, Corvallis, Oregon, United States of America; Molecular and Cellular Biology Program, Oregon State University, Corvallis, Oregon, United States of America.
PLoS One. 2015 Mar 18;10(3):e0119254. doi: 10.1371/journal.pone.0119254. eCollection 2015.
This work is about assessing model adequacy for negative binomial (NB) regression, particularly (1) assessing the adequacy of the NB assumption, and (2) assessing the appropriateness of models for NB dispersion parameters. Tools for the first are appropriate for NB regression generally; those for the second are primarily intended for RNA sequencing (RNA-Seq) data analysis. The typically small number of biological samples and large number of genes in RNA-Seq analysis motivate us to address the trade-offs between robustness and statistical power using NB regression models. One widely-used power-saving strategy, for example, is to assume some commonalities of NB dispersion parameters across genes via simple models relating them to mean expression rates, and many such models have been proposed. As RNA-Seq analysis is becoming ever more popular, it is appropriate to make more thorough investigations into power and robustness of the resulting methods, and into practical tools for model assessment. In this article, we propose simulation-based statistical tests and diagnostic graphics to address model adequacy. We provide simulated and real data examples to illustrate that our proposed methods are effective for detecting the misspecification of the NB mean-variance relationship as well as judging the adequacy of fit of several NB dispersion models.
这项工作是关于评估负二项式(NB)回归模型的适用性,特别是(1)评估NB假设的适用性,以及(2)评估NB离散参数模型的恰当性。用于第一项的工具通常适用于NB回归;用于第二项的工具主要用于RNA测序(RNA-Seq)数据分析。RNA-Seq分析中通常生物样本数量少且基因数量多,这促使我们使用NB回归模型来解决稳健性和统计功效之间的权衡问题。例如,一种广泛使用的节省功效的策略是通过将NB离散参数与平均表达率相关联的简单模型,假设跨基因的NB离散参数存在一些共性,并且已经提出了许多这样的模型。随着RNA-Seq分析越来越受欢迎,对所得方法的功效和稳健性以及模型评估的实用工具进行更深入的研究是合适的。在本文中,我们提出基于模拟的统计检验和诊断图形来解决模型适用性问题。我们提供模拟和实际数据示例,以说明我们提出的方法对于检测NB均值 - 方差关系的错误设定以及判断几个NB离散模型的拟合恰当性是有效的。