Tang Jing, Tao Jinglun, Urakawa Hidetoshi, Corander Jukka
University of Helsinki.
Stat Appl Genet Mol Biol. 2007;6:Article30. doi: 10.2202/1544-6115.1303. Epub 2007 Nov 6.
The investigation of microbial communities is an essential part of the study of the biosphere. Flexible molecular fingerprinting tools such as terminal-restriction fragment length polymorphism (T-RFLP) analysis are often applied in the studies to enable the characterization of the microbial population. However, such data have so far been primarily analyzed using conventional clustering methods. Here we introduce a Bayesian model-based method for the purpose of comparing microbial communities using T-RFLP data. Such datasets have in general several challenging features, e.g. sparseness, missing values and structurally zero-valued observations. These features are taken into account by developing a Bayesian latent class mixture model for the observations in our framework. To make inferences under the model we use a recent Markov chain Monte Carlo (MCMC) -based method for the Bayesian model selection. To assess the introduced method we analyze both simulated and real datasets. The simulations show that our approach compares preferably to standard statistical clustering tools, such as k-means, hierarchical clustering, and Autoclass. The developed tool is freely available as a software package T-BAPS at http://www.abo.fi/fak/mnf/mate/jc/software/t-baps.html.
微生物群落的研究是生物圈研究的重要组成部分。诸如末端限制性片段长度多态性(T-RFLP)分析等灵活的分子指纹识别工具常用于此类研究,以实现对微生物种群的表征。然而,迄今为止,此类数据主要使用传统聚类方法进行分析。在此,我们引入一种基于贝叶斯模型的方法,用于利用T-RFLP数据比较微生物群落。此类数据集通常具有几个具有挑战性的特征,例如稀疏性、缺失值和结构上的零值观测。在我们的框架中,通过为观测值开发贝叶斯潜在类别混合模型来考虑这些特征。为了在模型下进行推断,我们使用一种基于马尔可夫链蒙特卡罗(MCMC)的最新方法进行贝叶斯模型选择。为了评估所引入的方法,我们分析了模拟数据集和真实数据集。模拟结果表明,我们的方法优于标准统计聚类工具,如k均值、层次聚类和自动分类。开发的工具作为软件包T-BAPS可在http://www.abo.fi/fak/mnf/mate/jc/software/t-baps.html免费获取。