Suppr超能文献

网格组装:一种基于寡核苷酸组成的分区策略,用于辅助宏基因组序列组装。

Grid-Assembly: An oligonucleotide composition-based partitioning strategy to aid metagenomic sequence assembly.

作者信息

Ghosh Tarini Shankar, Mehra Varun, Mande Sharmila S

机构信息

Biosciences R&D Division, TCS Innovation Labs, 54-B Hadapsar Industrial Estate, Pune, Maharashtra 411013, India.

出版信息

J Bioinform Comput Biol. 2015 Jun;13(3):1541004. doi: 10.1142/S0219720015410048. Epub 2015 Feb 8.

Abstract

Metagenomics approach involves extraction, sequencing and characterization of the genomic content of entire community of microbes present in a given environment. In contrast to genomic data, accurate assembly of metagenomic sequences is a challenging task. Given the huge volume and the diverse taxonomic origin of metagenomic sequences, direct application of single genome assembly methods on metagenomes are likely to not only lead to an immense increase in requirements of computational infrastructure, but also result in the formation of chimeric contigs. A strategy to address the above challenge would be to partition metagenomic sequence datasets into clusters and assemble separately the sequences in individual clusters using any single-genome assembly method. The current study presents such an approach that uses tetranucleotide usage patterns to first represent sequences as points in a three dimensional (3D) space. The 3D space is subsequently partitioned into "Grids". Sequences within overlapping grids are then progressively assembled using any available assembler. We demonstrate the applicability of the current Grid-Assembly method using various categories of assemblers as well as different simulated metagenomic datasets. Validation results indicate that the Grid-Assembly approach helps in improving the overall quality of assembly, in terms of the purity and volume of the assembled contigs.

摘要

宏基因组学方法涉及对给定环境中存在的整个微生物群落的基因组内容进行提取、测序和表征。与基因组数据相比,宏基因组序列的准确组装是一项具有挑战性的任务。鉴于宏基因组序列的巨大数量和多样的分类学来源,将单基因组组装方法直接应用于宏基因组不仅可能导致对计算基础设施的需求大幅增加,还会导致嵌合重叠群的形成。解决上述挑战的一种策略是将宏基因组序列数据集划分为簇,并使用任何单基因组组装方法分别组装各个簇中的序列。当前的研究提出了这样一种方法,即利用四核苷酸使用模式首先将序列表示为三维(3D)空间中的点。随后将3D空间划分为“网格”。然后使用任何可用的组装器逐步组装重叠网格内的序列。我们使用各种类型的组装器以及不同的模拟宏基因组数据集证明了当前网格组装方法的适用性。验证结果表明,就组装重叠群的纯度和数量而言,网格组装方法有助于提高组装的整体质量。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验