Department of Population Health and Reproduction, University of California Davis, Davis, USA.
Paul G. Allen School of Computer Science and Engineering, University of Washington, Seattle, USA.
Genome Biol. 2020 Jul 6;21(1):164. doi: 10.1186/s13059-020-02066-4.
Genomes computationally inferred from large metagenomic data sets are often incomplete and may be missing functionally important content and strain variation. We introduce an information retrieval system for large metagenomic data sets that exploits the sparsity of DNA assembly graphs to efficiently extract subgraphs surrounding an inferred genome. We apply this system to recover missing content from genome bins and show that substantial genomic sequence variation is present in a real metagenome. Our software implementation is available at https://github.com/spacegraphcats/spacegraphcats under the 3-Clause BSD License.
从大型宏基因组数据集计算推断出的基因组往往是不完整的,可能会缺少功能重要的内容和菌株变异。我们引入了一种针对大型宏基因组数据集的信息检索系统,该系统利用 DNA 组装图的稀疏性来有效地提取围绕推断基因组的子图。我们将此系统应用于从基因组 bin 中恢复丢失的内容,并表明在真实的宏基因组中存在大量的基因组序列变异。我们的软件实现在 3-Clause BSD License 下可在 https://github.com/spacegraphcats/spacegraphcats 获得。