Zhang Ruichang, Cheng Zhanzhan, Guan Jihong, Zhou Shuigeng
BMC Bioinformatics. 2015;16 Suppl 5(Suppl 5):S2. doi: 10.1186/1471-2105-16-S5-S2. Epub 2015 Mar 18.
BACKGROUND: With the rapid development of high-throughput technologies, researchers can sequence the whole metagenome of a microbial community sampled directly from the environment. The assignment of these metagenomic reads into different species or taxonomical classes is a vital step for metagenomic analysis, which is referred to as binning of metagenomic data. RESULTS: In this paper, we propose a new method TM-MCluster for binning metagenomic reads. First, we represent each metagenomic read as a set of "k-mers" with their frequencies occurring in the read. Then, we employ a probabilistic topic model -- the Latent Dirichlet Allocation (LDA) model to the reads, which generates a number of hidden "topics" such that each read can be represented by a distribution vector of the generated topics. Finally, as in the MCluster method, we apply SKWIC -- a variant of the classical K-means algorithm with automatic feature weighting mechanism to cluster these reads represented by topic distributions. CONCLUSIONS: Experiments show that the new method TM-MCluster outperforms major existing methods, including AbundanceBin, MetaCluster 3.0/5.0 and MCluster. This result indicates that the exploitation of topic modeling can effectively improve the binning performance of metagenomic reads.
BMC Bioinformatics. 2015
IEEE/ACM Trans Comput Biol Bioinform. 2014
Methods Mol Biol. 2018
Bioinformatics. 2016-9-1
J Comput Biol. 2012-2
Brief Bioinform. 2021-9-2
BMC Bioinformatics. 2017-12-1
BMC Bioinformatics. 2017-10-16
BMC Genomics. 2017-1-25
Springerplus. 2016-9-20
BMC Bioinformatics. 2016-5-13
IEEE/ACM Trans Comput Biol Bioinform. 2014
J Comput Biol. 2012-2
IEEE/ACM Trans Comput Biol Bioinform. 2012
J Comput Biol. 2011-3
Genome Inform. 2009-10