Ghosh Tarini Shankar, Mehra Varun, Mande Sharmila S
Biosciences R&D Division, TCS Innovation Labs, 54-B Hadapsar Industrial Estate, Pune, Maharashtra 411013, India.
J Bioinform Comput Biol. 2015 Jun;13(3):1541004. doi: 10.1142/S0219720015410048. Epub 2015 Feb 8.
Metagenomics approach involves extraction, sequencing and characterization of the genomic content of entire community of microbes present in a given environment. In contrast to genomic data, accurate assembly of metagenomic sequences is a challenging task. Given the huge volume and the diverse taxonomic origin of metagenomic sequences, direct application of single genome assembly methods on metagenomes are likely to not only lead to an immense increase in requirements of computational infrastructure, but also result in the formation of chimeric contigs. A strategy to address the above challenge would be to partition metagenomic sequence datasets into clusters and assemble separately the sequences in individual clusters using any single-genome assembly method. The current study presents such an approach that uses tetranucleotide usage patterns to first represent sequences as points in a three dimensional (3D) space. The 3D space is subsequently partitioned into "Grids". Sequences within overlapping grids are then progressively assembled using any available assembler. We demonstrate the applicability of the current Grid-Assembly method using various categories of assemblers as well as different simulated metagenomic datasets. Validation results indicate that the Grid-Assembly approach helps in improving the overall quality of assembly, in terms of the purity and volume of the assembled contigs.
宏基因组学方法涉及对给定环境中存在的整个微生物群落的基因组内容进行提取、测序和表征。与基因组数据相比,宏基因组序列的准确组装是一项具有挑战性的任务。鉴于宏基因组序列的巨大数量和多样的分类学来源,将单基因组组装方法直接应用于宏基因组不仅可能导致对计算基础设施的需求大幅增加,还会导致嵌合重叠群的形成。解决上述挑战的一种策略是将宏基因组序列数据集划分为簇,并使用任何单基因组组装方法分别组装各个簇中的序列。当前的研究提出了这样一种方法,即利用四核苷酸使用模式首先将序列表示为三维(3D)空间中的点。随后将3D空间划分为“网格”。然后使用任何可用的组装器逐步组装重叠网格内的序列。我们使用各种类型的组装器以及不同的模拟宏基因组数据集证明了当前网格组装方法的适用性。验证结果表明,就组装重叠群的纯度和数量而言,网格组装方法有助于提高组装的整体质量。