Pham Diem-Trang, Gao Shanshan, Phan Vinhthuy
1 Department of Computer Science, The University of Memphis, Memphis, TN 38152, USA.
J Bioinform Comput Biol. 2017 Jun;15(3):1740001. doi: 10.1142/S0219720017400017. Epub 2017 Mar 7.
Determining abundances of microbial genomes in metagenomic samples is an important problem in analyzing metagenomic data. Although homology-based methods are popular, they have shown to be computationally expensive due to the alignment of tens of millions of reads from metagenomic samples to reference genomes of hundreds to thousands of environmental microbial species. We introduce an efficient alignment-free approach to estimate abundances of microbial genomes in metagenomic samples. The approach is based on solving linear and quadratic programs, which are represented by genome-specific markers (GSM). We compared our method against popular alignment-free and homology-based methods. Without contamination, our method was more accurate than other alignment-free methods while being much faster than a homology-based method. In more realistic settings where samples were contaminated with human DNA, our method was the most accurate method in predicting abundance at varying levels of contamination. We achieve higher accuracy than both alignment-free and homology-based methods.
确定宏基因组样本中微生物基因组的丰度是宏基因组数据分析中的一个重要问题。尽管基于同源性的方法很流行,但由于要将来自宏基因组样本的数千万条 reads 与数百到数千种环境微生物物种的参考基因组进行比对,已证明其计算成本很高。我们引入了一种高效的无比对方法来估计宏基因组样本中微生物基因组的丰度。该方法基于求解线性和二次规划,这些规划由基因组特异性标记(GSM)表示。我们将我们的方法与流行的无比对方法和基于同源性的方法进行了比较。在没有污染的情况下,我们的方法比其他无比对方法更准确,同时比基于同源性的方法快得多。在更现实的样本被人类 DNA 污染的情况下,我们的方法是在不同污染水平下预测丰度时最准确的方法。我们比无比对方法和基于同源性的方法都具有更高的准确性。