序列计数数据不适用于负二项分布。

Sequence count data are poorly fit by the negative binomial distribution.

机构信息

Department of Data Analysis and Mathematical Modelling, Ghent University, Ghent, Belgium.

Centre for Computer-Assisted Research in Mathematics and its Applications, School of Mathematical and Physical Sciences, University of Newcastle, Newcastle, Australia.

出版信息

PLoS One. 2020 Apr 30;15(4):e0224909. doi: 10.1371/journal.pone.0224909. eCollection 2020.

DOI:10.1371/journal.pone.0224909

PMID:32352970

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC7192467/

Abstract

Sequence count data are commonly modelled using the negative binomial (NB) distribution. Several empirical studies, however, have demonstrated that methods based on the NB-assumption do not always succeed in controlling the false discovery rate (FDR) at its nominal level. In this paper, we propose a dedicated statistical goodness of fit test for the NB distribution in regression models and demonstrate that the NB-assumption is violated in many publicly available RNA-Seq and 16S rRNA microbiome datasets. The zero-inflated NB distribution was not found to give a substantially better fit. We also show that the NB-based tests perform worse on the features for which the NB-assumption was violated than on the features for which no significant deviation was detected. This gives an explanation for the poor behaviour of NB-based tests in many published evaluation studies. We conclude that nonparametric tests should be preferred over parametric methods.

摘要

序列计数数据通常使用负二项式（NB）分布进行建模。然而，多项经验研究表明，基于 NB 假设的方法并不总是能够在其名义水平上控制假发现率（FDR）。在本文中，我们提出了一种用于回归模型中 NB 分布的专用统计拟合优度检验，并证明在许多公开可用的 RNA-Seq 和 16S rRNA 微生物组数据集中，NB 假设被违反了。零膨胀 NB 分布并没有发现拟合得更好。我们还表明，在 NB 假设被违反的特征上，基于 NB 的检验的性能比在没有检测到显著偏差的特征上的性能更差。这解释了在许多已发表的评估研究中，基于 NB 的检验表现不佳的原因。我们的结论是，应该优先选择非参数检验而不是参数方法。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/cc8e/7192467/ed1519711a08/pone.0224909.g001.jpg

相似文献

Sequence count data are poorly fit by the negative binomial distribution.

PLoS One. 2020 Apr 30;15(4):e0224909. doi: 10.1371/journal.pone.0224909. eCollection 2020.

Statistical modelling of falls count data with excess zeros.

Inj Prev. 2011 Aug;17(4):266-70. doi: 10.1136/ip.2011.031740. Epub 2011 Jun 8.

A methodology to design heuristics for model selection based on the characteristics of data: Application to investigate when the Negative Binomial Lindley (NB-L) is preferred over the Negative Binomial (NB).

Accid Anal Prev. 2017 Oct;107:186-194. doi: 10.1016/j.aap.2017.07.002. Epub 2017 Sep 5.

Statistical modelling for falls count data.

Accid Anal Prev. 2010 Mar;42(2):384-92. doi: 10.1016/j.aap.2009.08.018. Epub 2009 Oct 1.

A comparison of statistical methods for modeling count data with an application to hospital length of stay.

BMC Med Res Methodol. 2022 Aug 4;22(1):211. doi: 10.1186/s12874-022-01685-8.

On performance of parametric and distribution-free models for zero-inflated and over-dispersed count responses.

Stat Med. 2015 Oct 30;34(24):3235-45. doi: 10.1002/sim.6560. Epub 2015 Jun 15.

Randomized quantile residuals for diagnosing zero-inflated generalized linear mixed models with applications to microbiome count data.

BMC Bioinformatics. 2021 Nov 25;22(1):564. doi: 10.1186/s12859-021-04371-6.

On the use of zero-inflated and hurdle models for modeling vaccine adverse event count data.

J Biopharm Stat. 2006;16(4):463-81. doi: 10.1080/10543400600719384.

Count data distributions and their zero-modified equivalents as a framework for modelling microbial data with a relatively high occurrence of zero counts.

Int J Food Microbiol. 2010 Jan 1;136(3):268-77. doi: 10.1016/j.ijfoodmicro.2009.10.016. Epub 2009 Oct 28.

Analyzing hospitalization data: potential limitations of Poisson regression.

Nephrol Dial Transplant. 2015 Aug;30(8):1244-9. doi: 10.1093/ndt/gfv071. Epub 2015 Mar 25.

引用本文的文献

Poisson Beta Regression for Count Data With an Application to Hospital Length of Stay Data.

Stat Med. 2025 Aug;44(18-19):e70217. doi: 10.1002/sim.70217.

Detecting differential transcript usage in complex diseases with SPIT.

Cell Rep Methods. 2024 Mar 25;4(3):100736. doi: 10.1016/j.crmeth.2024.100736. Epub 2024 Mar 19.

Simple and flexible sign and rank-based methods for testing for differential abundance in microbiome studies.

PLoS One. 2023 Sep 26;18(9):e0292055. doi: 10.1371/journal.pone.0292055. eCollection 2023.

Detecting differential transcript usage in complex diseases with SPIT.

bioRxiv. 2023 Jul 10:2023.07.10.548289. doi: 10.1101/2023.07.10.548289.

Benchmarking differential abundance analysis methods for correlated microbiome sequencing data.

Brief Bioinform. 2023 Jan 19;24(1). doi: 10.1093/bib/bbac607.

llperm: a permutation of regressor residuals test for microbiome data.

BMC Bioinformatics. 2022 Dec 12;23(1):540. doi: 10.1186/s12859-022-05088-w.

Exploring the Microbiome Analysis and Visualization Landscape.

Front Bioinform. 2021 Dec 2;1:774631. doi: 10.3389/fbinf.2021.774631. eCollection 2021.

Investigating differential abundance methods in microbiome data: A benchmark study.

PLoS Comput Biol. 2022 Sep 8;18(9):e1010467. doi: 10.1371/journal.pcbi.1010467. eCollection 2022 Sep.

Differential expression of single-cell RNA-seq data using Tweedie models.

Stat Med. 2022 Aug 15;41(18):3492-3510. doi: 10.1002/sim.9430. Epub 2022 Jun 2.

Exaggerated false positives by popular differential expression methods when analyzing human population samples.

Genome Biol. 2022 Mar 15;23(1):79. doi: 10.1186/s13059-022-02648-4.

本文引用的文献

Differential gene expression analysis tools exhibit substandard performance for long non-coding RNA-sequencing data.

Genome Biol. 2018 Jul 24;19(1):96. doi: 10.1186/s13059-018-1466-5.

Observation weights unlock bulk RNA-seq tools for zero inflation and single-cell applications.

Genome Biol. 2018 Feb 26;19(1):24. doi: 10.1186/s13059-018-1406-4.

Quantitative microbiome profiling links gut community variation to microbial load.

Nature. 2017 Nov 23;551(7681):507-511. doi: 10.1038/nature24460. Epub 2017 Nov 15.

Flow cytometric monitoring of bacterioplankton phenotypic diversity predicts high population-specific feeding rates by invasive dreissenid mussels.

Environ Microbiol. 2018 Feb;20(2):521-534. doi: 10.1111/1462-2920.13953. Epub 2017 Nov 3.

A broken promise: microbiome differential abundance methods do not control the false discovery rate.

Brief Bioinform. 2019 Jan 18;20(1):210-221. doi: 10.1093/bib/bbx104.

Negative binomial mixed models for analyzing microbiome count data.

BMC Bioinformatics. 2017 Jan 3;18(1):4. doi: 10.1186/s12859-016-1441-7.

Single-gene negative binomial regression models for RNA-Seq data with higher-order asymptotic inference.

Stat Interface. 2015;8(4):405-418. doi: 10.4310/SII.2015.v8.n4.a1.

Towards a bacterial treatment for armpit malodour.

Exp Dermatol. 2017 May;26(5):388-391. doi: 10.1111/exd.13259. Epub 2017 Feb 2.

NBLDA: negative binomial linear discriminant analysis for RNA-Seq data.

BMC Bioinformatics. 2016 Sep 13;17(1):369. doi: 10.1186/s12859-016-1208-1.

Absolute quantification of microbial taxon abundances.

ISME J. 2017 Feb;11(2):584-587. doi: 10.1038/ismej.2016.117. Epub 2016 Sep 9.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

序列计数数据不适用于负二项分布。

Sequence count data are poorly fit by the negative binomial distribution.

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献