Wang Ying, Hu Haiyan, Li Xiaoman
Department of Electric Engineering and Computer Science, University of Central Florida, Orlando, FL, 32816, USA.
Burnett School of Biomedical Science, University of Central Florida, Orlando, FL, 32816, USA.
BMC Bioinformatics. 2015 Feb 5;16:36. doi: 10.1186/s12859-015-0473-8.
Binning environmental shotgun reads is one of the most fundamental tasks in metagenomic studies, in which mixed reads from different species or operational taxonomical units (OTUs) are separated into different groups. While dozens of binning methods are available, there is still room for improvement.
We developed a novel taxonomy-independent approach called MBBC (Metagenomic Binning Based on Clustering) to cluster environmental shotgun reads, by considering k-mer frequency in reads and Markov properties of the inferred OTUs. Tested on twelve simulated datasets, MBBC reliably estimated the species number, the genome size, and the relative abundance of each species, independent of whether there are errors in reads. Tested on multiple experimental datasets, MBBC outperformed two state-of-the-art taxonomy-independent methods, in terms of the accuracy of the estimated species number, genome sizes, and percentages of correctly assigned reads, among other metrics.
We have developed a novel method for binning metagenomic reads based on clustering. This method is demonstrated to reliably predict species numbers, genome sizes, relative species abundances, and k-mer coverage in simple datasets. Our method also has a high accuracy in read binning. The MBBC software is freely available at http://eecs.ucf.edu/~xiaoman/MBBC/MBBC.html .
对环境鸟枪法测序 reads 进行分箱是宏基因组学研究中最基本的任务之一,在该任务中,来自不同物种或操作分类单元(OTU)的混合 reads 被分离到不同的组中。虽然有几十种分箱方法可用,但仍有改进的空间。
我们开发了一种名为 MBBC(基于聚类的宏基因组分箱)的新型非依赖分类学方法,通过考虑 reads 中的 k-mer 频率和推断的 OTU 的马尔可夫性质来对环境鸟枪法测序 reads 进行聚类。在十二个模拟数据集上进行测试,MBBC 能够可靠地估计物种数量、基因组大小以及每个物种的相对丰度,而与 reads 中是否存在错误无关。在多个实验数据集上进行测试,在估计物种数量、基因组大小以及正确分配 reads 的百分比等指标的准确性方面,MBBC 优于两种最先进的非依赖分类学方法。
我们开发了一种基于聚类的宏基因组 reads 分箱新方法。该方法在简单数据集中被证明能够可靠地预测物种数量、基因组大小、相对物种丰度和 k-mer 覆盖率。我们的方法在 reads 分箱方面也具有很高的准确性。MBBC 软件可在 http://eecs.ucf.edu/~xiaoman/MBBC/MBBC.html 免费获取。