Suppr超能文献

通过重复稀疏化描述微生物群落的下一代测序数据来增强多样性分析。

Enhancing diversity analysis by repeatedly rarefying next generation sequencing data describing microbial communities.

机构信息

Department of Biology, University of Waterloo, 200 University Ave. W, Waterloo, ON, N2L 3G1, Canada.

Department of Civil and Environmental Engineering, University of Waterloo, 200 University Ave. W, Waterloo, ON, N2L 3G1, Canada.

出版信息

Sci Rep. 2021 Nov 16;11(1):22302. doi: 10.1038/s41598-021-01636-1.

Abstract

Amplicon sequencing has revolutionized our ability to study DNA collected from environmental samples by providing a rapid and sensitive technique for microbial community analysis that eliminates the challenges associated with lab cultivation and taxonomic identification through microscopy. In water resources management, it can be especially useful to evaluate ecosystem shifts in response to natural and anthropogenic landscape disturbances to signal potential water quality concerns, such as the detection of toxic cyanobacteria or pathogenic bacteria. Amplicon sequencing data consist of discrete counts of sequence reads, the sum of which is the library size. Groups of samples typically have different library sizes that are not representative of biological variation; library size normalization is required to meaningfully compare diversity between them. Rarefaction is a widely used normalization technique that involves the random subsampling of sequences from the initial sample library to a selected normalized library size. This process is often dismissed as statistically invalid because subsampling effectively discards a portion of the observed sequences, yet it remains prevalent in practice and the suitability of rarefying, relative to many other normalization approaches, for diversity analysis has been argued. Here, repeated rarefying is proposed as a tool to normalize library sizes for diversity analyses. This enables (i) proportionate representation of all observed sequences and (ii) characterization of the random variation introduced to diversity analyses by rarefying to a smaller library size shared by all samples. While many deterministic data transformations are not tailored to produce equal library sizes, repeatedly rarefying reflects the probabilistic process by which amplicon sequencing data are obtained as a representation of the amplified source microbial community. Specifically, it evaluates which data might have been obtained if a particular sample's library size had been smaller and allows graphical representation of the effects of this library size normalization process upon diversity analysis results.

摘要

扩增子测序通过提供一种快速灵敏的微生物群落分析技术,彻底改变了我们从环境样本中收集 DNA 的能力,该技术消除了通过显微镜进行实验室培养和分类鉴定所带来的挑战。在水资源管理中,评估生态系统对自然和人为景观干扰的变化特别有用,可以及时发现潜在的水质问题,例如检测有毒蓝藻或病原菌。扩增子测序数据由离散的序列读数计数组成,其总和即为文库大小。通常情况下,不同样本组的文库大小不同,不能代表生物变异;需要对文库大小进行标准化处理,才能对它们之间的多样性进行有意义的比较。稀疏化是一种广泛使用的标准化技术,它涉及从初始样本文库中随机抽取序列,直到达到选定的标准化文库大小。尽管这一过程在实践中仍然很常见,但由于随机抽样会有效地丢弃一部分观察到的序列,因此它通常被认为在统计学上是无效的,然而,关于稀疏化相对于许多其他标准化方法在多样性分析中的适用性仍存在争议。在此,我们提出重复稀疏化作为一种用于多样性分析的文库大小标准化工具。这使得(i)所有观察到的序列都能得到适当的表示,以及(ii)可以对由于稀疏化导致的较小文库大小引入的随机变异进行特征化,这些文库大小是所有样本共有的。虽然许多确定性数据转换并不是专门用来产生相等的文库大小,但重复稀疏化反映了扩增子测序数据的获得过程是扩增源微生物群落的一种表示,是一个概率过程。具体来说,它评估了如果特定样本的文库大小较小,可能会获得哪些数据,并允许以图形方式表示文库大小标准化过程对多样性分析结果的影响。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d611/8595385/db6f139b6a8d/41598_2021_1636_Fig1_HTML.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验