Royalty Taylor M, Steen Andrew D
Department of Earth and Planetary Sciences, University of Tennessee, Knoxville, Tennessee, USA.
Department of Earth and Planetary Sciences, University of Tennessee, Knoxville, Tennessee, USA
mSystems. 2019 Sep 17;4(5):e00384-19. doi: 10.1128/mSystems.00384-19.
We applied theoretical and simulation-based approaches to characterize how microbial community structure influences the amount of sequencing effort to reconstruct metagenomes that are assembled from short-read sequences. First, a coupon collector equation was proposed as an analytical model for predicting sequencing effort as a function of microbial community structure. Characterization was performed by varying community structure properties such as richness, evenness, and genome size. Simulations demonstrated that while community richness and evenness influenced the sequencing effort required to sequence a community metagenome to exhaustion, the effort necessary to sequence an individual genome to a target fraction of exhaustion depended only on the relative abundance of the genome and its genome size. A second analysis evaluated the quantity, completion, and contamination of metagenome-assembled genomes (MAGs) as a function of sequencing effort on four preexisting sequence read data sets from different environments. These data sets were subsampled to various degrees of completeness to simulate the effect of sequencing effort on MAG retrieval. Modeling suggested that sequencing efforts beyond what is typical in published experiments (1 to 10 Gbp) would generate diminishing returns in terms of MAG binning. A software tool, Genome Relative Abundance to Sequencing Effort (GRASE), was created to assist investigators to further explore this relationship. Reevaluation of the relationship between sequencing effort and binning success in the context of genome relative abundance, as opposed to base pairs, provides a constraint on sequencing experiments based on the relative abundance of microbes in an environment rather than arbitrary levels of sequencing effort. Short-read sequencing with Illumina sequencing technology provides an accurate, high-throughput method for characterizing the metabolic potential of microbial communities. Short-read sequences can be assembled and binned into metagenome-assembled genomes, thus shedding light on the function of microbial ecosystems that are important for health, agriculture, and Earth system processes. The work presented here provides an analytical framework for selecting sequencing effort as a function of genome relative abundance. As such, experimental goals in metagenome-assembled genome creation projects can select sequencing effort based on the rarest target genome as a constrained threshold. We hope that the results presented here, as well as GRASE, will be valuable to researchers planning sequencing experiments.
我们应用基于理论和模拟的方法来表征微生物群落结构如何影响从短读长序列组装宏基因组所需的测序工作量。首先,提出了一个优惠券收集者方程作为预测测序工作量与微生物群落结构函数关系的分析模型。通过改变群落结构属性(如丰富度、均匀度和基因组大小)进行表征。模拟结果表明,虽然群落丰富度和均匀度会影响对群落宏基因组进行穷举测序所需的工作量,但对单个基因组进行测序至目标穷举比例所需的工作量仅取决于该基因组的相对丰度及其基因组大小。第二项分析评估了宏基因组组装基因组(MAG)的数量、完整性和污染情况,作为对来自不同环境的四个现有序列读取数据集测序工作量的函数。对这些数据集进行不同程度的二次抽样,以模拟测序工作量对MAG检索的影响。模型表明,超出已发表实验中典型测序量(1至10 Gbp)的测序工作量在MAG分箱方面的回报将逐渐减少。创建了一个软件工具“基因组相对丰度与测序工作量(GRASE)”,以帮助研究人员进一步探索这种关系。在基因组相对丰度而非碱基对的背景下重新评估测序工作量与分箱成功之间的关系,为基于环境中微生物相对丰度而非任意测序工作量水平的测序实验提供了限制。使用Illumina测序技术进行短读长测序为表征微生物群落的代谢潜力提供了一种准确、高通量的方法。短读长序列可以组装并分箱到宏基因组组装基因组中,从而揭示对健康、农业和地球系统过程至关重要的微生物生态系统的功能。本文介绍的工作提供了一个根据基因组相对丰度选择测序工作量的分析框架。因此,宏基因组组装基因组创建项目中的实验目标可以根据最稀有的目标基因组选择测序工作量作为约束阈值。我们希望本文介绍的结果以及GRASE对计划进行测序实验的研究人员有价值。