• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

在存在测序错误的情况下估计和比较微生物多样性。

Estimating and comparing microbial diversity in the presence of sequencing errors.

作者信息

Chiu Chun-Huo, Chao Anne

机构信息

Institute of Statistics, National Tsing Hua University , Hsin-Chu , Taiwan.

出版信息

PeerJ. 2016 Feb 1;4:e1634. doi: 10.7717/peerj.1634. eCollection 2016.

DOI:10.7717/peerj.1634
PMID:26855872
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC4741086/
Abstract

Estimating and comparing microbial diversity are statistically challenging due to limited sampling and possible sequencing errors for low-frequency counts, producing spurious singletons. The inflated singleton count seriously affects statistical analysis and inferences about microbial diversity. Previous statistical approaches to tackle the sequencing errors generally require different parametric assumptions about the sampling model or about the functional form of frequency counts. Different parametric assumptions may lead to drastically different diversity estimates. We focus on nonparametric methods which are universally valid for all parametric assumptions and can be used to compare diversity across communities. We develop here a nonparametric estimator of the true singleton count to replace the spurious singleton count in all methods/approaches. Our estimator of the true singleton count is in terms of the frequency counts of doubletons, tripletons and quadrupletons, provided these three frequency counts are reliable. To quantify microbial alpha diversity for an individual community, we adopt the measure of Hill numbers (effective number of taxa) under a nonparametric framework. Hill numbers, parameterized by an order q that determines the measures' emphasis on rare or common species, include taxa richness (q = 0), Shannon diversity (q = 1, the exponential of Shannon entropy), and Simpson diversity (q = 2, the inverse of Simpson index). A diversity profile which depicts the Hill number as a function of order q conveys all information contained in a taxa abundance distribution. Based on the estimated singleton count and the original non-singleton frequency counts, two statistical approaches (non-asymptotic and asymptotic) are developed to compare microbial diversity for multiple communities. (1) A non-asymptotic approach refers to the comparison of estimated diversities of standardized samples with a common finite sample size or sample completeness. This approach aims to compare diversity estimates for equally-large or equally-complete samples; it is based on the seamless rarefaction and extrapolation sampling curves of Hill numbers, specifically for q = 0, 1 and 2. (2) An asymptotic approach refers to the comparison of the estimated asymptotic diversity profiles. That is, this approach compares the estimated profiles for complete samples or samples whose size tends to be sufficiently large. It is based on statistical estimation of the true Hill number of any order q ≥ 0. In the two approaches, replacing the spurious singleton count by our estimated count, we can greatly remove the positive biases associated with diversity estimates due to spurious singletons and also make fair comparisons across microbial communities, as illustrated in our simulation results and in applying our method to analyze sequencing data from viral metagenomes.

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a93e/4741086/588e12820470/peerj-04-1634-g005.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a93e/4741086/2aabe2c9f8c5/peerj-04-1634-g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a93e/4741086/87b18dad9d47/peerj-04-1634-g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a93e/4741086/7c57d26f1430/peerj-04-1634-g003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a93e/4741086/7863966805aa/peerj-04-1634-g004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a93e/4741086/588e12820470/peerj-04-1634-g005.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a93e/4741086/2aabe2c9f8c5/peerj-04-1634-g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a93e/4741086/87b18dad9d47/peerj-04-1634-g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a93e/4741086/7c57d26f1430/peerj-04-1634-g003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a93e/4741086/7863966805aa/peerj-04-1634-g004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a93e/4741086/588e12820470/peerj-04-1634-g005.jpg
摘要

由于采样有限以及低频计数可能存在的测序错误会产生虚假单例,估计和比较微生物多样性在统计上具有挑战性。膨胀的单例计数严重影响对微生物多样性的统计分析和推断。以往处理测序错误的统计方法通常需要对采样模型或频率计数的函数形式做出不同的参数假设。不同的参数假设可能导致截然不同的多样性估计。我们专注于非参数方法,这些方法对所有参数假设普遍有效,可用于比较不同群落的多样性。我们在此开发了一种真实单例计数的非参数估计器,以在所有方法/途径中取代虚假单例计数。我们的真实单例计数估计器是根据双例、三例和四例的频率计数得出的,前提是这三个频率计数是可靠的。为了量化单个群落的微生物α多样性,我们在非参数框架下采用希尔数(分类单元有效数量)的度量。希尔数由阶数q参数化,q决定了该度量对稀有或常见物种的强调程度,包括分类单元丰富度(q = 0)、香农多样性(q = 1,香农熵的指数)和辛普森多样性(q = 2,辛普森指数的倒数)。描绘希尔数作为阶数q的函数的多样性剖面图传达了分类单元丰度分布中包含的所有信息。基于估计的单例计数和原始的非单例频率计数,开发了两种统计方法(非渐近和渐近)来比较多个群落的微生物多样性。(1)非渐近方法是指对具有相同有限样本大小或样本完整性的标准化样本的估计多样性进行比较。该方法旨在比较等大或等完整样本的多样性估计;它基于希尔数的无缝稀疏化和外推采样曲线,特别是对于q = 0、1和2。(2)渐近方法是指对估计的渐近多样性剖面图进行比较。也就是说,该方法比较完整样本或大小趋于足够大的样本的估计剖面图。它基于对任何阶数q≥0的真实希尔数的统计估计。在这两种方法中,用我们估计的计数取代虚假单例计数,我们可以大大消除由于虚假单例导致的与多样性估计相关的正偏差,并且还能在微生物群落之间进行公平比较,如我们的模拟结果以及将我们的方法应用于分析病毒宏基因组测序数据所示。

相似文献

1
Estimating and comparing microbial diversity in the presence of sequencing errors.在存在测序错误的情况下估计和比较微生物多样性。
PeerJ. 2016 Feb 1;4:e1634. doi: 10.7717/peerj.1634. eCollection 2016.
2
A more reliable species richness estimator based on the Gamma-Poisson model.基于伽马-泊松模型的更可靠物种丰富度估计器。
PeerJ. 2023 Jan 6;11:e14540. doi: 10.7717/peerj.14540. eCollection 2023.
3
Rarefaction and Extrapolation: Making Fair Comparison of Abundance-Sensitive Phylogenetic Diversity among Multiple Assemblages.稀疏化与外推法:对多个群落中丰度敏感的系统发育多样性进行公平比较
Syst Biol. 2017 Jan 1;66(1):100-111. doi: 10.1093/sysbio/syw073.
4
Robust estimation of microbial diversity in theory and in practice.理论和实践中微生物多样性的稳健估计。
ISME J. 2013 Jun;7(6):1092-101. doi: 10.1038/ismej.2013.10. Epub 2013 Feb 14.
5
Community assessment techniques and the implications for rarefaction and extrapolation with Hill numbers.社区评估技术以及希尔数在稀疏化和外推方面的意义。
Ecol Evol. 2017 Nov 21;7(24):11213-11226. doi: 10.1002/ece3.3580. eCollection 2017 Dec.
6
An improved nonparametric lower bound of species richness via a modified good-turing frequency formula.通过改进的古德-图灵频率公式得到的物种丰富度的改进非参数下界。
Biometrics. 2014 Sep;70(3):671-82. doi: 10.1111/biom.12200. Epub 2014 Jun 19.
7
Quantifying and estimating ecological network diversity based on incomplete sampling data.基于不完全采样数据的生态网络多样性量化与估计。
Philos Trans R Soc Lond B Biol Sci. 2023 Jul 17;378(1881):20220183. doi: 10.1098/rstb.2022.0183. Epub 2023 May 29.
8
Quantifying phenological diversity: a framework based on Hill numbers theory.量化物候多样性:基于希尔数理论的框架。
PeerJ. 2022 May 12;10:e13412. doi: 10.7717/peerj.13412. eCollection 2022.
9
Folic acid supplementation and malaria susceptibility and severity among people taking antifolate antimalarial drugs in endemic areas.在流行地区,服用抗叶酸抗疟药物的人群中,叶酸补充剂与疟疾易感性和严重程度的关系。
Cochrane Database Syst Rev. 2022 Feb 1;2(2022):CD014217. doi: 10.1002/14651858.CD014217.
10
Hill-based dissimilarity indices and null models for analysis of microbial community assembly.基于Hill 的非相似性指数和零模型用于分析微生物群落组装。
Microbiome. 2020 Sep 11;8(1):132. doi: 10.1186/s40168-020-00909-7.

引用本文的文献

1
The cervicovaginal microbiome of pregnant people living with HIV on antiretroviral therapy in the Democratic Republic of Congo: A Pilot Study and Global Meta-analysis.刚果民主共和国接受抗逆转录病毒治疗的感染艾滋病毒孕妇的宫颈阴道微生物群:一项试点研究和全球荟萃分析。
bioRxiv. 2025 Aug 21:2025.08.18.670785. doi: 10.1101/2025.08.18.670785.
2
High Schistosoma mansoni infection intensity is associated with distinct gut microbiota and low levels of systemic cytokines in children along the Albert-Nile, Northern Uganda.在乌干达北部阿尔伯特尼罗河沿岸的儿童中,曼氏血吸虫高感染强度与独特的肠道微生物群和低水平的全身细胞因子有关。
BMC Microbiol. 2025 Aug 14;25(1):506. doi: 10.1186/s12866-025-04252-5.
3

本文引用的文献

1
Estimating diversity via frequency ratios.通过频率比估计多样性。
Biometrics. 2015 Dec;71(4):1042-9. doi: 10.1111/biom.12332. Epub 2015 Jun 2.
2
Only Simpson diversity can be estimated accurately from microbial community fingerprints.只有从微生物群落指纹图谱中才能准确估计辛普森多样性。
Microb Ecol. 2014 Aug;68(2):169-72. doi: 10.1007/s00248-014-0394-5. Epub 2014 Mar 29.
3
An improved nonparametric lower bound of species richness via a modified good-turing frequency formula.通过改进的古德-图灵频率公式得到的物种丰富度的改进非参数下界。
Enhanced understanding of nitrogen fixing bacteria through DNA extraction with polyvinylidene fluoride membrane.
通过使用聚偏二氟乙烯膜提取DNA增强对固氮菌的理解。
Sci Rep. 2025 May 8;15(1):16079. doi: 10.1038/s41598-025-00173-5.
4
A shortcut to sample coverage standardization in metabarcoding data provides new insights into land-use effects on insect diversity.代谢条码数据中样本覆盖度标准化的捷径为土地利用对昆虫多样性的影响提供了新见解。
Proc Biol Sci. 2025 May;292(2046):20242927. doi: 10.1098/rspb.2024.2927. Epub 2025 May 7.
5
A comprehensive review and evaluation of species richness estimation.物种丰富度估计的全面综述与评估。
Brief Bioinform. 2025 Mar 4;26(2). doi: 10.1093/bib/bbaf158.
6
The fecal microbiota of the mouse-eared bat (Myotis velifer) with new records of microbial taxa for bats.鼠耳蝠(Myotis velifer)的粪便微生物群,以及蝙蝠微生物分类群的新记录。
PLoS One. 2024 Dec 5;19(12):e0314847. doi: 10.1371/journal.pone.0314847. eCollection 2024.
7
Modulation of AAV transduction and integration targeting by topoisomerase poisons.拓扑异构酶抑制剂对腺相关病毒转导和整合靶向的调控
Mol Ther Methods Clin Dev. 2024 Oct 28;32(4):101364. doi: 10.1016/j.omtm.2024.101364. eCollection 2024 Dec 12.
8
Insomnia, OSA, and Mood Disorders: The Gut Connection.失眠、阻塞性睡眠呼吸暂停与情绪障碍:肠道关联
Curr Psychiatry Rep. 2024 Dec;26(12):703-711. doi: 10.1007/s11920-024-01546-9. Epub 2024 Oct 14.
9
Rapid SARS-CoV-2 surveillance using clinical, pooled, or wastewater sequence as a sensor for population change.利用临床、混合或废水序列进行快速 SARS-CoV-2 监测,作为人群变化的传感器。
Genome Res. 2024 Oct 29;34(10):1651-1660. doi: 10.1101/gr.278594.123.
10
The effect of low-abundance OTU filtering methods on the reliability and variability of microbial composition assessed by 16S rRNA amplicon sequencing.低丰度 OTU 过滤方法对 16S rRNA 扩增子测序评估的微生物组成的可靠性和可变性的影响。
Front Cell Infect Microbiol. 2023 Jun 12;13:1165295. doi: 10.3389/fcimb.2023.1165295. eCollection 2023.
Biometrics. 2014 Sep;70(3):671-82. doi: 10.1111/biom.12200. Epub 2014 Jun 19.
4
Estimation of viral richness from shotgun metagenomes using a frequency count approach.基于频数计数方法从鸟枪法宏基因组中估算病毒丰度。
Microbiome. 2013 Feb 4;1(1):5. doi: 10.1186/2049-2618-1-5.
5
Utilizing novel diversity estimators to quantify multiple dimensions of microbial biodiversity across domains.利用新颖的多样性估算器来量化跨领域的微生物生物多样性的多个维度。
BMC Microbiol. 2013 Nov 15;13:259. doi: 10.1186/1471-2180-13-259.
6
The mean and variance of phylogenetic diversity under rarefaction.稀疏化条件下系统发育多样性的均值和方差。
Methods Ecol Evol. 2013 Jun 1;4(6):566-572. doi: 10.1111/2041-210X.12042.
7
Coverage-based rarefaction and extrapolation: standardizing samples by completeness rather than size.基于覆盖度的稀有度估计和外推:通过完整性而不是大小来标准化样本。
Ecology. 2012 Dec;93(12):2533-47. doi: 10.1890/11-1952.1.
8
Robust estimation of microbial diversity in theory and in practice.理论和实践中微生物多样性的稳健估计。
ISME J. 2013 Jun;7(6):1092-101. doi: 10.1038/ismej.2013.10. Epub 2013 Feb 14.
9
Estimating population diversity with CatchAll.使用 CatchAll 估计种群多样性。
Bioinformatics. 2012 Apr 1;28(7):1045-7. doi: 10.1093/bioinformatics/bts075. Epub 2012 Feb 13.
10
Estimating population diversity with unreliable low frequency counts.利用不可靠的低频计数估计种群多样性。
Pac Symp Biocomput. 2012:203-12.