Suppr超能文献

RNA测序数据负二项回归的拟合优度检验和模型诊断

Goodness-of-fit tests and model diagnostics for negative binomial regression of RNA sequencing data.

作者信息

Mi Gu, Di Yanming, Schafer Daniel W

机构信息

Department of Statistics, Oregon State University, Corvallis, Oregon, United States of America.

Department of Statistics, Oregon State University, Corvallis, Oregon, United States of America; Molecular and Cellular Biology Program, Oregon State University, Corvallis, Oregon, United States of America.

出版信息

PLoS One. 2015 Mar 18;10(3):e0119254. doi: 10.1371/journal.pone.0119254. eCollection 2015.

Abstract

This work is about assessing model adequacy for negative binomial (NB) regression, particularly (1) assessing the adequacy of the NB assumption, and (2) assessing the appropriateness of models for NB dispersion parameters. Tools for the first are appropriate for NB regression generally; those for the second are primarily intended for RNA sequencing (RNA-Seq) data analysis. The typically small number of biological samples and large number of genes in RNA-Seq analysis motivate us to address the trade-offs between robustness and statistical power using NB regression models. One widely-used power-saving strategy, for example, is to assume some commonalities of NB dispersion parameters across genes via simple models relating them to mean expression rates, and many such models have been proposed. As RNA-Seq analysis is becoming ever more popular, it is appropriate to make more thorough investigations into power and robustness of the resulting methods, and into practical tools for model assessment. In this article, we propose simulation-based statistical tests and diagnostic graphics to address model adequacy. We provide simulated and real data examples to illustrate that our proposed methods are effective for detecting the misspecification of the NB mean-variance relationship as well as judging the adequacy of fit of several NB dispersion models.

摘要

这项工作是关于评估负二项式(NB)回归模型的适用性,特别是(1)评估NB假设的适用性,以及(2)评估NB离散参数模型的恰当性。用于第一项的工具通常适用于NB回归;用于第二项的工具主要用于RNA测序(RNA-Seq)数据分析。RNA-Seq分析中通常生物样本数量少且基因数量多,这促使我们使用NB回归模型来解决稳健性和统计功效之间的权衡问题。例如,一种广泛使用的节省功效的策略是通过将NB离散参数与平均表达率相关联的简单模型,假设跨基因的NB离散参数存在一些共性,并且已经提出了许多这样的模型。随着RNA-Seq分析越来越受欢迎,对所得方法的功效和稳健性以及模型评估的实用工具进行更深入的研究是合适的。在本文中,我们提出基于模拟的统计检验和诊断图形来解决模型适用性问题。我们提供模拟和实际数据示例,以说明我们提出的方法对于检测NB均值 - 方差关系的错误设定以及判断几个NB离散模型的拟合恰当性是有效的。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6b39/4365073/0e4843b9888b/pone.0119254.g001.jpg

相似文献

1
Goodness-of-fit tests and model diagnostics for negative binomial regression of RNA sequencing data.
PLoS One. 2015 Mar 18;10(3):e0119254. doi: 10.1371/journal.pone.0119254. eCollection 2015.
2
The level of residual dispersion variation and the power of differential expression tests for RNA-Seq data.
PLoS One. 2015 Apr 7;10(4):e0120117. doi: 10.1371/journal.pone.0120117. eCollection 2015.
3
5
NBLDA: negative binomial linear discriminant analysis for RNA-Seq data.
BMC Bioinformatics. 2016 Sep 13;17(1):369. doi: 10.1186/s12859-016-1208-1.
6
A two-step integrated approach to detect differentially expressed genes in RNA-Seq data.
J Bioinform Comput Biol. 2016 Dec;14(6):1650034. doi: 10.1142/S0219720016500347. Epub 2016 Sep 15.
7
Bayesian analysis of RNA sequencing data by estimating multiple shrinkage priors.
Biostatistics. 2013 Jan;14(1):113-28. doi: 10.1093/biostatistics/kxs031. Epub 2012 Sep 17.
8
Identifying differentially spliced genes from two groups of RNA-seq samples.
Gene. 2013 Apr 10;518(1):164-70. doi: 10.1016/j.gene.2012.11.045. Epub 2012 Dec 8.
9
Statistical modelling of falls count data with excess zeros.
Inj Prev. 2011 Aug;17(4):266-70. doi: 10.1136/ip.2011.031740. Epub 2011 Jun 8.
10
Higher order asymptotics for negative binomial regression inferences from RNA-sequencing data.
Stat Appl Genet Mol Biol. 2013 Mar 26;12(1):49-70. doi: 10.1515/sagmb-2012-0071.

引用本文的文献

1
Environmental DNA allows upscaling spatial patterns of biodiversity in freshwater ecosystems.
Nat Commun. 2020 Jul 17;11(1):3585. doi: 10.1038/s41467-020-17337-8.
2
Sequence count data are poorly fit by the negative binomial distribution.
PLoS One. 2020 Apr 30;15(4):e0224909. doi: 10.1371/journal.pone.0224909. eCollection 2020.
3
DREAMSeq: An Improved Method for Analyzing Differentially Expressed Genes in RNA-seq Data.
Front Genet. 2018 Nov 30;9:588. doi: 10.3389/fgene.2018.00588. eCollection 2018.
4
A comparison of per sample global scaling and per gene normalization methods for differential expression analysis of RNA-seq data.
PLoS One. 2017 May 1;12(5):e0176185. doi: 10.1371/journal.pone.0176185. eCollection 2017.
7
The level of residual dispersion variation and the power of differential expression tests for RNA-Seq data.
PLoS One. 2015 Apr 7;10(4):e0120117. doi: 10.1371/journal.pone.0120117. eCollection 2015.

本文引用的文献

1
The level of residual dispersion variation and the power of differential expression tests for RNA-Seq data.
PLoS One. 2015 Apr 7;10(4):e0120117. doi: 10.1371/journal.pone.0120117. eCollection 2015.
2
Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2.
Genome Biol. 2014;15(12):550. doi: 10.1186/s13059-014-0550-8.
4
Detecting differential expression in RNA-sequence data using quasi-likelihood with shrunken dispersion estimates.
Stat Appl Genet Mol Biol. 2012 Oct 22;11(5):/j/sagmb.2012.11.issue-5/1544-6115.1826/1544-6115.1826.xml. doi: 10.1515/1544-6115.1826.
5
A new shrinkage estimator for dispersion improves differential expression detection in RNA-seq data.
Biostatistics. 2013 Apr;14(2):232-43. doi: 10.1093/biostatistics/kxs033. Epub 2012 Sep 22.
6
Differential expression analysis of multifactor RNA-Seq experiments with respect to biological variation.
Nucleic Acids Res. 2012 May;40(10):4288-97. doi: 10.1093/nar/gks042. Epub 2012 Jan 28.
7
GENE-counter: a computational pipeline for the analysis of RNA-Seq data for gene expression differences.
PLoS One. 2011;6(10):e25279. doi: 10.1371/journal.pone.0025279. Epub 2011 Oct 6.
8
Differential expression analysis for sequence count data.
Genome Biol. 2010;11(10):R106. doi: 10.1186/gb-2010-11-10-r106. Epub 2010 Oct 27.
9
A scaling normalization method for differential expression analysis of RNA-seq data.
Genome Biol. 2010;11(3):R25. doi: 10.1186/gb-2010-11-3-r25. Epub 2010 Mar 2.
10

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验