用于RNA测序数据的单基因负二项回归模型及高阶渐近推断

Single-gene negative binomial regression models for RNA-Seq data with higher-order asymptotic inference.

作者信息

Di Yanming

机构信息

Department of Statistics, Oregon State University, Corvallis, OR 97331, USA.

出版信息

Stat Interface. 2015;8(4):405-418. doi: 10.4310/SII.2015.v8.n4.a1.

DOI:10.4310/SII.2015.v8.n4.a1

PMID:28042360

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC5193394/

Abstract

We consider negative binomial (NB) regression models for RNA-Seq read counts and investigate an approach where such NB regression models are fitted to individual genes separately and, in particular, the NB dispersion parameter is estimated from each gene separately without assuming commonalities between genes. This single-gene approach contrasts with the more widely-used dispersion-modeling approach where the NB dispersion is modeled as a simple function of the mean or other measures of read abundance, and then estimated from a large number of genes combined. We show that through the use of higher-order asymptotic techniques, inferences with correct type I errors can be made about the regression coefficients in a single-gene NB regression model even when the dispersion is unknown and the sample size is small. The motivations for studying single-gene models include: 1) they provide a basis of reference for understanding and quantifying the power-robustness trade-offs of the dispersion-modeling approach; 2) they can also be potentially useful in practice if moderate sample sizes become available and diagnostic tools indicate potential problems with simple models of dispersion.

摘要

我们考虑用于RNA测序读数计数的负二项式（NB）回归模型，并研究一种方法，即此类NB回归模型分别拟合到各个基因，特别是NB离散参数是从每个基因单独估计的，而不假设基因之间的共性。这种单基因方法与更广泛使用的离散建模方法形成对比，在后者中，NB离散被建模为均值或其他读数丰度度量的简单函数，然后从大量组合基因中估计。我们表明，通过使用高阶渐近技术，即使离散未知且样本量较小，也可以对单基因NB回归模型中的回归系数进行具有正确I型错误的推断。研究单基因模型的动机包括：1）它们为理解和量化离散建模方法的功效-稳健性权衡提供了参考基础；2）如果有适度的样本量且诊断工具表明简单离散模型存在潜在问题，它们在实践中也可能有用。

相似文献

Single-gene negative binomial regression models for RNA-Seq data with higher-order asymptotic inference.用于RNA测序数据的单基因负二项回归模型及高阶渐近推断

Stat Interface. 2015;8(4):405-418. doi: 10.4310/SII.2015.v8.n4.a1.

Goodness-of-fit tests and model diagnostics for negative binomial regression of RNA sequencing data.RNA测序数据负二项回归的拟合优度检验和模型诊断

PLoS One. 2015 Mar 18;10(3):e0119254. doi: 10.1371/journal.pone.0119254. eCollection 2015.

The level of residual dispersion variation and the power of differential expression tests for RNA-Seq data.RNA测序数据的残余离散度变化水平及差异表达检验效能

PLoS One. 2015 Apr 7;10(4):e0120117. doi: 10.1371/journal.pone.0120117. eCollection 2015.

Higher order asymptotics for negative binomial regression inferences from RNA-sequencing data.基于RNA测序数据的负二项回归推断的高阶渐近性

Stat Appl Genet Mol Biol. 2013 Mar 26;12(1):49-70. doi: 10.1515/sagmb-2012-0071.

Evaluation of logistic regression models and effect of covariates for case-control study in RNA-Seq analysis.RNA测序分析中病例对照研究的逻辑回归模型评估及协变量效应

BMC Bioinformatics. 2017 Feb 6;18(1):91. doi: 10.1186/s12859-017-1498-y.

Quantifying the impact of inter-site heterogeneity on the distribution of ChIP-seq data.量化 ChIP-seq 数据分布中站点间异质性的影响。

Front Genet. 2014 Nov 14;5:399. doi: 10.3389/fgene.2014.00399. eCollection 2014.

Sample size calculations for the differential expression analysis of RNA-seq data using a negative binomial regression model.使用负二项回归模型对RNA测序数据进行差异表达分析的样本量计算。

Stat Appl Genet Mol Biol. 2019 Jan 22;18(1):/j/sagmb.2019.18.issue-1/sagmb-2018-0021/sagmb-2018-0021.xml. doi: 10.1515/sagmb-2018-0021.

A comparison of observation-level random effect and Beta-Binomial models for modelling overdispersion in Binomial data in ecology & evolution.用于模拟生态学与进化领域二项式数据过分散的观测水平随机效应模型与贝塔-二项式模型的比较

PeerJ. 2015 Jul 21;3:e1114. doi: 10.7717/peerj.1114. eCollection 2015.

Investigating the effects of the fixed and varying dispersion parameters of Poisson-gamma models on empirical Bayes estimates.研究泊松-伽马模型的固定和变化离散参数对经验贝叶斯估计的影响。

Accid Anal Prev. 2008 Jul;40(4):1441-57. doi: 10.1016/j.aap.2008.03.014. Epub 2008 Apr 18.

A Simple and Adaptive Dispersion Regression Model for Count Data.一种用于计数数据的简单自适应离散回归模型。

Entropy (Basel). 2018 Feb 22;20(2):142. doi: 10.3390/e20020142.

引用本文的文献

Micro-environmental sensing by bone marrow stroma identifies IL-6 and TGFβ1 as regulators of hematopoietic ageing.骨髓基质通过微环境感应识别出 IL-6 和 TGFβ1 是造血衰老的调节因子。

Nat Commun. 2020 Aug 14;11(1):4075. doi: 10.1038/s41467-020-17942-7.

Sequence count data are poorly fit by the negative binomial distribution.序列计数数据不适用于负二项分布。

PLoS One. 2020 Apr 30;15(4):e0224909. doi: 10.1371/journal.pone.0224909. eCollection 2020.

Model-Based Clustering with Measurement or Estimation Errors.基于模型的聚类分析与测量或估计误差。

Genes (Basel). 2020 Feb 10;11(2):185. doi: 10.3390/genes11020185.

Searching for best lower dimensional visualization angles for high dimensional RNA-Seq data.寻找高维RNA测序数据的最佳低维可视化角度。

PeerJ. 2018 Jul 12;6:e5199. doi: 10.7717/peerj.5199. eCollection 2018.

Differential Expression of Genes Involved in Host Recognition, Attachment, and Degradation in the Mycoparasite Tolypocladium ophioglossoides.在菌寄生真菌小蛇孢囊霉中参与宿主识别、附着和降解的基因的差异表达

G3 (Bethesda). 2016 Jan 22;6(3):731-41. doi: 10.1534/g3.116.027045.

The level of residual dispersion variation and the power of differential expression tests for RNA-Seq data.RNA测序数据的残余离散度变化水平及差异表达检验效能

PLoS One. 2015 Apr 7;10(4):e0120117. doi: 10.1371/journal.pone.0120117. eCollection 2015.

本文引用的文献

Goodness-of-fit tests and model diagnostics for negative binomial regression of RNA sequencing data.RNA测序数据负二项回归的拟合优度检验和模型诊断

PLoS One. 2015 Mar 18;10(3):e0119254. doi: 10.1371/journal.pone.0119254. eCollection 2015.

Higher order asymptotics for negative binomial regression inferences from RNA-sequencing data.基于RNA测序数据的负二项回归推断的高阶渐近性

Stat Appl Genet Mol Biol. 2013 Mar 26;12(1):49-70. doi: 10.1515/sagmb-2012-0071.

Detecting differential expression in RNA-sequence data using quasi-likelihood with shrunken dispersion estimates.使用具有收缩离散估计的拟似然法检测RNA序列数据中的差异表达。

Stat Appl Genet Mol Biol. 2012 Oct 22;11(5):/j/sagmb.2012.11.issue-5/1544-6115.1826/1544-6115.1826.xml. doi: 10.1515/1544-6115.1826.

Normalization, testing, and false discovery rate estimation for RNA-sequencing data.RNA-seq 数据的归一化、测试和错误发现率估计。

Biostatistics. 2012 Jul;13(3):523-38. doi: 10.1093/biostatistics/kxr031. Epub 2011 Oct 14.

Differential expression analysis for sequence count data.差异表达分析序列计数数据。

Genome Biol. 2010;11(10):R106. doi: 10.1186/gb-2010-11-10-r106. Epub 2010 Oct 27.

Transcript assembly and quantification by RNA-Seq reveals unannotated transcripts and isoform switching during cell differentiation.通过 RNA-Seq 进行转录本组装和定量分析揭示了细胞分化过程中未注释的转录本和异构体转换。

Nat Biotechnol. 2010 May;28(5):511-5. doi: 10.1038/nbt.1621. Epub 2010 May 2.

A scaling normalization method for differential expression analysis of RNA-seq data.RNA-seq 数据差异表达分析的缩放标准化方法。

Genome Biol. 2010;11(3):R25. doi: 10.1186/gb-2010-11-3-r25. Epub 2010 Mar 2.

edgeR: a Bioconductor package for differential expression analysis of digital gene expression data.edgeR：一个用于数字基因表达数据差异表达分析的 Bioconductor 包。

Bioinformatics. 2010 Jan 1;26(1):139-40. doi: 10.1093/bioinformatics/btp616. Epub 2009 Nov 11.

RNA-Seq: a revolutionary tool for transcriptomics.RNA测序：转录组学的革命性工具。

Nat Rev Genet. 2009 Jan;10(1):57-63. doi: 10.1038/nrg2484.

RNA-seq: an assessment of technical reproducibility and comparison with gene expression arrays.RNA测序：技术可重复性评估及与基因表达阵列的比较

Genome Res. 2008 Sep;18(9):1509-17. doi: 10.1101/gr.079558.108. Epub 2008 Jun 11.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验