Suppr超能文献

序列计数数据不适用于负二项分布。

Sequence count data are poorly fit by the negative binomial distribution.

机构信息

Department of Data Analysis and Mathematical Modelling, Ghent University, Ghent, Belgium.

Centre for Computer-Assisted Research in Mathematics and its Applications, School of Mathematical and Physical Sciences, University of Newcastle, Newcastle, Australia.

出版信息

PLoS One. 2020 Apr 30;15(4):e0224909. doi: 10.1371/journal.pone.0224909. eCollection 2020.

Abstract

Sequence count data are commonly modelled using the negative binomial (NB) distribution. Several empirical studies, however, have demonstrated that methods based on the NB-assumption do not always succeed in controlling the false discovery rate (FDR) at its nominal level. In this paper, we propose a dedicated statistical goodness of fit test for the NB distribution in regression models and demonstrate that the NB-assumption is violated in many publicly available RNA-Seq and 16S rRNA microbiome datasets. The zero-inflated NB distribution was not found to give a substantially better fit. We also show that the NB-based tests perform worse on the features for which the NB-assumption was violated than on the features for which no significant deviation was detected. This gives an explanation for the poor behaviour of NB-based tests in many published evaluation studies. We conclude that nonparametric tests should be preferred over parametric methods.

摘要

序列计数数据通常使用负二项式(NB)分布进行建模。然而,多项经验研究表明,基于 NB 假设的方法并不总是能够在其名义水平上控制假发现率(FDR)。在本文中,我们提出了一种用于回归模型中 NB 分布的专用统计拟合优度检验,并证明在许多公开可用的 RNA-Seq 和 16S rRNA 微生物组数据集中,NB 假设被违反了。零膨胀 NB 分布并没有发现拟合得更好。我们还表明,在 NB 假设被违反的特征上,基于 NB 的检验的性能比在没有检测到显著偏差的特征上的性能更差。这解释了在许多已发表的评估研究中,基于 NB 的检验表现不佳的原因。我们的结论是,应该优先选择非参数检验而不是参数方法。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/cc8e/7192467/ed1519711a08/pone.0224909.g001.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验