MetaCluster 5.0:一种针对嘈杂样本中低丰度物种的元基因组数据的两阶段分箱方法。

MetaCluster 5.0: a two-round binning approach for metagenomic data for low-abundance species in a noisy sample.

机构信息

Department of Computer Science, The University of Hong Kong, Hong Kong.

出版信息

Bioinformatics. 2012 Sep 15;28(18):i356-i362. doi: 10.1093/bioinformatics/bts397.

Abstract

MOTIVATION

Metagenomic binning remains an important topic in metagenomic analysis. Existing unsupervised binning methods for next-generation sequencing (NGS) reads do not perform well on (i) samples with low-abundance species or (ii) samples (even with high abundance) when there are many extremely low-abundance species. These two problems are common for real metagenomic datasets. Binning methods that can solve these problems are desirable.

RESULTS

We proposed a two-round binning method (MetaCluster 5.0) that aims at identifying both low-abundance and high-abundance species in the presence of a large amount of noise due to many extremely low-abundance species. In summary, MetaCluster 5.0 uses a filtering strategy to remove noise from the extremely low-abundance species. It separate reads of high-abundance species from those of low-abundance species in two different rounds. To overcome the issue of low coverage for low-abundance species, multiple w values are used to group reads with overlapping w-mers, whereas reads from high-abundance species are grouped with high confidence based on a large w and then binning expands to low-abundance species using a relaxed (shorter) w. Compared to the recent tools, TOSS and MetaCluster 4.0, MetaCluster 5.0 can find more species (especially those with low abundance of say 6× to 10×) and can achieve better sensitivity and specificity using less memory and running time.

AVAILABILITY

http://i.cs.hku.hk/~alse/MetaCluster/

CONTACT

chin@cs.hku.hk.

摘要

动机

宏基因组bin 仍然是宏基因组分析中的一个重要课题。现有的下一代测序(NGS)reads 无监督 bin 方法在以下两种情况下表现不佳:(i)低丰度物种的样本;(ii)即使丰度较高,但存在大量极低丰度物种的样本。这两个问题在真实的宏基因组数据集中很常见。因此,我们需要能够解决这些问题的 bin 方法。

结果

我们提出了一种两阶段 bin 方法(MetaCluster 5.0),旨在识别存在大量由大量极低丰度物种引起的噪声的情况下的低丰度和高丰度物种。简而言之,MetaCluster 5.0 使用过滤策略来消除极低丰度物种的噪声。它在两个不同的轮次中,将高丰度物种的reads 与低丰度物种的reads 分开。为了克服低丰度物种覆盖度低的问题,使用多个 w 值将具有重叠 w-mers 的reads 分组,而高丰度物种的reads 则基于较大的 w 值以高置信度分组,然后使用宽松(较短)的 w 值将 bin 扩展到低丰度物种。与最近的工具 TOSS 和 MetaCluster 4.0 相比,MetaCluster 5.0 可以发现更多的物种(特别是那些丰度低至 6×到 10×的物种),并且使用更少的内存和运行时间可以实现更好的灵敏度和特异性。

可用性

http://i.cs.hku.hk/~alse/MetaCluster/

联系方式

chin@cs.hku.hk

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/55eb/3436824/2c37c2fcecec/bts397f1.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索