Suppr超能文献

GraphBin:使用组装图对宏基因组序列进行精细化分箱。

GraphBin: refined binning of metagenomic contigs using assembly graphs.

机构信息

Research School of Computer Science, College of Engineering and Computer Science, Australian National University, Canberra ACT 0200, Australia.

出版信息

Bioinformatics. 2020 Jun 1;36(11):3307-3313. doi: 10.1093/bioinformatics/btaa180.

Abstract

MOTIVATION

The field of metagenomics has provided valuable insights into the structure, diversity and ecology within microbial communities. One key step in metagenomics analysis is to assemble reads into longer contigs which are then binned into groups of contigs that belong to different species present in the metagenomic sample. Binning of contigs plays an important role in metagenomics and most available binning algorithms bin contigs using genomic features such as oligonucleotide/k-mer composition and contig coverage. As metagenomic contigs are derived from the assembly process, they are output from the underlying assembly graph which contains valuable connectivity information between contigs that can be used for binning.

RESULTS

We propose GraphBin, a new binning method that makes use of the assembly graph and applies a label propagation algorithm to refine the binning result of existing tools. We show that GraphBin can make use of the assembly graphs constructed from both the de Bruijn graph and the overlap-layout-consensus approach. Moreover, we demonstrate improved experimental results from GraphBin in terms of identifying mis-binned contigs and binning of contigs discarded by existing binning tools. To the best of our knowledge, this is the first time that the information from the assembly graph has been used in a tool for the binning of metagenomic contigs.

AVAILABILITY AND IMPLEMENTATION

The source code of GraphBin is available at https://github.com/Vini2/GraphBin.

CONTACT

vijini.mallawaarachchi@anu.edu.au or yu.lin@anu.edu.au.

SUPPLEMENTARY INFORMATION

Supplementary data are available at Bioinformatics online.

摘要

动机

宏基因组学领域为微生物群落的结构、多样性和生态学提供了有价值的见解。宏基因组分析的关键步骤之一是将读取序列组装成长的连续序列,然后将这些连续序列分为属于宏基因组样本中存在的不同物种的连续序列组。连续序列的分类在宏基因组学中起着重要作用,大多数可用的分类算法使用基因组特征(如寡核苷酸/ K-mer 组成和连续序列覆盖度)对连续序列进行分类。由于宏基因组连续序列是从组装过程中得出的,因此它们是从底层组装图输出的,该组装图包含了连续序列之间有价值的连接信息,可用于分类。

结果

我们提出了 GraphBin,这是一种新的分类方法,它利用组装图并应用标签传播算法来细化现有工具的分类结果。我们表明,GraphBin 可以利用基于 de Bruijn 图和重叠布局一致方法构建的组装图。此外,我们证明了 GraphBin 在识别错误分类的连续序列和分类现有分类工具丢弃的连续序列方面可以获得更好的实验结果。据我们所知,这是首次在用于宏基因组连续序列分类的工具中使用组装图的信息。

可用性和实现

GraphBin 的源代码可在 https://github.com/Vini2/GraphBin 上获得。

联系人

vijini.mallawaarachchi@anu.edu.auyu.lin@anu.edu.au

补充信息

补充数据可在《生物信息学》在线获得。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验