Ai Dongmei, Pan Hongfei, Huang Ruocheng, Xia Li C
School of Mathematics and Physics, University of Science and Technology Beijing, Beijing 100083, China.
Sinotech Genomics, Shanghai 200120, China.
Genes (Basel). 2018 Jun 20;9(6):313. doi: 10.3390/genes9060313.
With the rapid development of high-throughput sequencing technology, the analysis of metagenomic sequencing data and the accurate and efficient estimation of relative microbial abundance have become important ways to explore the microbial composition and function of microbes. In addition, the accuracy and efficiency of the relative microbial abundance estimation are closely related to the algorithm and the selection of the reference sequence for sequence alignment. We introduced the microbial core genome as the reference sequence for potential microbes in a metagenomic sample, and we constructed a finite mixture and latent Dirichlet models and used the Gibbs sampling algorithm to estimate the relative abundance of microorganisms. The simulation results showed that our approach can improve the efficiency while maintaining high accuracy and is more suitable for high-throughput metagenomic data. The new approach was implemented in our CoreProbe package which provides a pipeline for an accurate and efficient estimation of the relative abundance of microbes in a community. This tool is available free of charge from the CoreProbe's website: Access the Docker image with the following instruction: sudo docker pull panhongfei/coreprobe:1.0.
随着高通量测序技术的快速发展,宏基因组测序数据分析以及相对微生物丰度的准确高效估计已成为探索微生物组成和功能的重要途径。此外,相对微生物丰度估计的准确性和效率与算法以及序列比对参考序列的选择密切相关。我们引入微生物核心基因组作为宏基因组样本中潜在微生物的参考序列,并构建了有限混合模型和潜在狄利克雷模型,使用吉布斯采样算法来估计微生物的相对丰度。模拟结果表明,我们的方法在保持高精度的同时可以提高效率,并且更适合高通量宏基因组数据。这种新方法在我们的CoreProbe软件包中得以实现,该软件包提供了一个用于准确高效估计群落中微生物相对丰度的流程。此工具可从CoreProbe网站免费获取:使用以下指令访问Docker镜像:sudo docker pull panhongfei/coreprobe:1.0