Vitalant Research Institute, San Francisco, CA, 94118, USA.
Department of Laboratory Medicine, University of California at San Francisco, San Francisco, CA, 94107, USA.
BMC Bioinformatics. 2021 Mar 12;22(1):119. doi: 10.1186/s12859-021-04038-2.
Metagenomics is the study of microbial genomes for pathogen detection and discovery in human clinical, animal, and environmental samples via Next-Generation Sequencing (NGS). Metagenome de novo sequence assembly is a crucial analytical step in which longer contigs, ideally whole chromosomes/genomes, are formed from shorter NGS reads. However, the contigs generated from the de novo assembly are often very fragmented and rarely longer than a few kilo base pairs (kb). Therefore, a time-consuming extension process is routinely performed on the de novo assembled contigs.
To facilitate this process, we propose a new tool for metagenome contig extension after de novo assembly. ContigExtender employs a novel recursive extending strategy that explores multiple extending paths to achieve highly accurate longer contigs. We demonstrate that ContigExtender outperforms existing tools in synthetic, animal, and human metagenomics datasets.
A novel software tool ContigExtender has been developed to assist and enhance the performance of metagenome de novo assembly. ContigExtender effectively extends contigs from a variety of sources and can be incorporated in most viral metagenomics analysis pipelines for a wide variety of applications, including pathogen detection and viral discovery.
宏基因组学是通过下一代测序(NGS)在人类临床、动物和环境样本中进行病原体检测和发现的微生物基因组研究。宏基因组从头序列组装是一个关键的分析步骤,在此步骤中,通过较短的 NGS 读取来形成更长的连续序列,理想情况下是整个染色体/基因组。然而,从头组装生成的连续序列通常非常碎片化,很少超过几个千碱基对(kb)。因此,通常需要对从头组装的连续序列进行耗时的延伸过程。
为了便于这个过程,我们提出了一种新的用于宏基因组从头组装后连续序列延伸的工具。ContigExtender 采用了一种新颖的递归延伸策略,该策略探索了多种延伸路径,以实现高度准确的更长连续序列。我们证明 ContigExtender 在合成、动物和人类宏基因组数据集上优于现有工具。
开发了一种新的软件工具 ContigExtender,以辅助和增强宏基因组从头组装的性能。ContigExtender 可以有效地从各种来源扩展连续序列,并且可以整合到大多数病毒宏基因组分析管道中,用于各种应用,包括病原体检测和病毒发现。