Earlham Institute, Norwich NR4 7UZ, UK.
Robert Koch Institute, 13353 Berlin, Germany.
Bioinformatics. 2023 Jan 1;39(1). doi: 10.1093/bioinformatics/btad020.
The assembly of contiguous sequence from metagenomic samples presents a particular challenge, due to the presence of multiple species, often closely related, at varying levels of abundance. Capturing diversity within species, for example, viral haplotypes, or bacterial strain-level diversity, is even more challenging.
We present MetaCortex, a metagenome assembler that captures intra-species diversity by searching for signatures of local variation along assembled sequences in the underlying assembly graph and outputting these sequences in sequence graph format. We show that MetaCortex produces accurate assemblies with higher genome coverage and contiguity than other popular metagenomic assemblers on mock viral communities with high levels of strain-level diversity and on simulated communities containing simulated strains.
Source code is freely available to download from https://github.com/SR-Martin/metacortex, is implemented in C and supported on MacOS and Linux. The version used for the results presented in this article is available at doi.org/10.5281/zenodo.7273627.
Supplementary data are available at Bioinformatics online.
由于存在多种生物,且它们的丰度不同,亲缘关系密切,因此从宏基因组样本中组装连续序列是一项特别具有挑战性的任务。例如,捕获物种内的多样性(如病毒单倍型或细菌菌株水平的多样性)更具挑战性。
我们提出了 MetaCortex,这是一种宏基因组组装器,它通过在基础组装图中搜索组装序列上的局部变异特征,并以序列图格式输出这些序列,从而捕获物种内的多样性。我们表明,MetaCortex 生成的组装体比其他流行的宏基因组组装器具有更高的基因组覆盖率和连续性,特别是在具有高菌株水平多样性的模拟病毒群落和包含模拟菌株的模拟群落上。
可从 https://github.com/SR-Martin/metacortex 下载源代码,它是用 C 语言编写的,支持 MacOS 和 Linux。本文介绍的版本可在 doi.org/10.5281/zenodo.7273627 获得。
补充数据可在 Bioinformatics 在线获取。