Suppr超能文献

基于多重收缩先验估计的 RNA 测序数据分析的贝叶斯方法

Bayesian analysis of RNA sequencing data by estimating multiple shrinkage priors.

机构信息

Department of Epidemiology and Biostatistics, VU University (Medical Center), PO Box 7057, 1007 MB Amsterdam, The Netherlands.

出版信息

Biostatistics. 2013 Jan;14(1):113-28. doi: 10.1093/biostatistics/kxs031. Epub 2012 Sep 17.

Abstract

Next generation sequencing is quickly replacing microarrays as a technique to probe different molecular levels of the cell, such as DNA or RNA. The technology provides higher resolution, while reducing bias. RNA sequencing results in counts of RNA strands. This type of data imposes new statistical challenges. We present a novel, generic approach to model and analyze such data. Our approach aims at large flexibility of the likelihood (count) model and the regression model alike. Hence, a variety of count models is supported, such as the popular NB model, which accounts for overdispersion. In addition, complex, non-balanced designs and random effects are accommodated. Like some other methods, our method provides shrinkage of dispersion-related parameters. However, we extend it by enabling joint shrinkage of parameters, including those for which inference is desired. We argue that this is essential for Bayesian multiplicity correction. Shrinkage is effectuated by empirically estimating priors. We discuss several parametric (mixture) and non-parametric priors and develop procedures to estimate (parameters of) those. Inference is provided by means of local and Bayesian false discovery rates. We illustrate our method on several simulations and two data sets, also to compare it with other methods. Model- and data-based simulations show substantial improvements in the sensitivity at the given specificity. The data motivate the use of the ZI-NB as a powerful alternative to the NB, which results in higher detection rates for low-count data. Finally, compared with other methods, the results on small sample subsets are more reproducible when validated on their large sample complements, illustrating the importance of the type of shrinkage.

摘要

下一代测序技术正在迅速取代微阵列,成为探测细胞不同分子水平(如 DNA 或 RNA)的技术。该技术提供了更高的分辨率,同时减少了偏差。RNA 测序结果为 RNA 链的计数。这种类型的数据带来了新的统计挑战。我们提出了一种新颖的、通用的方法来对这种数据进行建模和分析。我们的方法旨在使似然(计数)模型和回归模型具有很大的灵活性。因此,支持多种计数模型,如流行的 NB 模型,该模型可以解释过度分散。此外,还可以适应复杂的、不平衡的设计和随机效应。与其他一些方法一样,我们的方法提供了与分散相关的参数的收缩。然而,我们通过使参数(包括期望进行推断的参数)的联合收缩来扩展它。我们认为这对于贝叶斯多重性校正至关重要。收缩是通过经验估计先验来实现的。我们讨论了几种参数(混合)和非参数先验,并开发了估计这些先验的程序。推断是通过局部和贝叶斯错误发现率来提供的。我们在几个模拟和两个数据集上演示了我们的方法,也将其与其他方法进行了比较。模型和基于数据的模拟表明,在给定特异性的情况下,灵敏度有了显著提高。这些数据促使使用 ZI-NB 作为 NB 的强大替代品,这可以为低计数数据提供更高的检测率。最后,与其他方法相比,当在其大样本补集中进行验证时,在小样本子集上的结果更具可重复性,这说明了收缩类型的重要性。

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验