Corander Jukka, Tang Jing
Department of Mathematics and Statistics, P.O. Box 68, University of Helsinki, 00014 Helsinki, Finland.
Math Biosci. 2007 Jan;205(1):19-31. doi: 10.1016/j.mbs.2006.09.015. Epub 2006 Sep 28.
The Bayesian model-based approach to inferring hidden genetic population structures using multilocus molecular markers has become a popular tool within certain branches of biology. In particular, it has been shown that heterogeneous data arising from genetically dissimilar latent groups of individuals can be effectively modelled using an unsupervised classification formulation. However, most currently employed models ignore potential linkage within the employed molecular information, and can therefore lead to biased inferences under certain circumstances. Utilizing the general theory of graphical models, we develop a framework that accounts for dependences both within linked molecular marker loci and DNA sequence data. Due to a high level of sequence conservation among eukaryotic species, the latter aspect is particularly relevant for analyzing rapidly evolving microbial species. The advantages of incorporating the dependence due to linkage in the classification models are illustrated by analyses of both simulated data and real samples of Bacillus cereus.
基于贝叶斯模型,利用多位点分子标记推断隐藏的遗传种群结构的方法,已成为生物学某些分支中一种流行的工具。特别是,研究表明,使用无监督分类公式可以有效地对来自遗传上不同的潜在个体群体的异质数据进行建模。然而,目前大多数使用的模型忽略了所使用分子信息内的潜在连锁关系,因此在某些情况下可能导致有偏差的推断。利用图形模型的一般理论,我们开发了一个框架,该框架考虑了连锁分子标记位点和DNA序列数据中的依赖性。由于真核生物物种之间存在高度的序列保守性,后一个方面对于分析快速进化的微生物物种尤为重要。通过对蜡样芽孢杆菌的模拟数据和真实样本的分析,说明了在分类模型中纳入连锁依赖性的优势。