Department of Computer Science, Hong Kong Baptist University, Kowloon Tong, Hong Kong, China.
Department of Computer Science, City University of Hong Kong, Kowloon Tong, Hong Kong, China.
Microbiome. 2020 Nov 11;8(1):156. doi: 10.1186/s40168-020-00929-3.
The human microbiota are complex systems with important roles in our physiological activities and diseases. Sequencing the microbial genomes in the microbiota can help in our interpretation of their activities. The vast majority of the microbes in the microbiota cannot be isolated for individual sequencing. Current metagenomics practices use short-read sequencing to simultaneously sequence a mixture of microbial genomes. However, these results are in ambiguity during genome assembly, leading to unsatisfactory microbial genome completeness and contig continuity. Linked-read sequencing is able to remove some of these ambiguities by attaching the same barcode to the reads from a long DNA fragment (10-100 kb), thus improving metagenome assembly. However, it is not clear how the choices for several parameters in the use of linked-read sequencing affect the assembly quality.
We first examined the effects of read depth (C) on metagenome assembly from linked-reads in simulated data and a mock community. The results showed that C positively correlated with the length of assembled sequences but had little effect on their qualities. The latter observation was corroborated by tests using real data from the human gut microbiome, where C demonstrated minor impact on the sequence quality as well as on the proportion of bins annotated as draft genomes. On the other hand, metagenome assembly quality was susceptible to read depth per fragment (C) and DNA fragment physical depth (C). For the same C, deeper C resulted in more draft genomes while deeper C improved the quality of the draft genomes. We also found that average fragment length (μ) had marginal effect on assemblies, while fragments per partition (N) impacted the off-target reads involved in local assembly, namely, lower N values would lead to better assemblies by reducing the ambiguities of the off-target reads. In general, the use of linked-reads improved the assembly for contig N50 when compared to Illumina short-reads, but not when compared to PacBio CCS (circular consensus sequencing) long-reads.
We investigated the influence of linked-read sequencing parameters on metagenome assembly comprehensively. While the quality of genome assembly from linked-reads cannot rival that from PacBio CCS long-reads, the case for using linked-read sequencing remains persuasive due to its low cost and high base-quality. Our study revealed that the probable best practice in using linked-reads for metagenome assembly was to merge the linked-reads from multiple libraries, where each had sufficient C but a smaller amount of input DNA. Video Abstract.
人类微生物组是复杂的系统,在我们的生理活动和疾病中起着重要作用。对微生物组中的微生物基因组进行测序有助于我们解释其活动。微生物组中的绝大多数微生物都无法单独分离进行测序。当前的宏基因组学实践使用短读测序来同时对微生物基因组混合物进行测序。然而,在基因组组装过程中,这些结果存在模糊性,导致微生物基因组的完整性和连续度不理想。连接读取测序通过将相同的条形码附加到来自长 DNA 片段(10-100kb)的读取上来消除其中的一些模糊性,从而提高宏基因组组装的质量。但是,目前尚不清楚在使用连接读取测序时,选择几个参数会如何影响组装质量。
我们首先在模拟数据和模拟群落中检查了连接读取的读深度(C)对宏基因组组装的影响。结果表明,C 与组装序列的长度呈正相关,但对其质量几乎没有影响。这一观察结果得到了来自人类肠道微生物组的真实数据测试的证实,其中 C 对序列质量以及注释为草案基因组的 bin 比例的影响较小。另一方面,宏基因组组装质量容易受到片段的读深度(C)和 DNA 片段物理深度(C)的影响。对于相同的 C,更深的 C 会产生更多的草案基因组,而更深的 C 会提高草案基因组的质量。我们还发现平均片段长度(μ)对组装的影响不大,而每个分区的片段数(N)则影响参与局部组装的非目标读取,即,较低的 N 值会通过减少非目标读取的模糊性来改善组装。一般来说,与 Illumina 短读相比,使用连接读取可以提高组装的 contig N50,但与 PacBio CCS(圆形共识测序)长读相比则不行。
我们全面研究了连接读取测序参数对宏基因组组装的影响。虽然连接读取组装基因组的质量无法与 PacBio CCS 长读媲美,但由于其成本低且碱基质量高,使用连接读取测序的情况仍然具有说服力。我们的研究表明,在使用连接读取进行宏基因组组装时,可能的最佳实践是合并来自多个文库的连接读取,每个文库都有足够的 C,但输入 DNA 量较少。视频摘要。