Corander Jukka, Waldmann Patrik, Marttinen Pekka, Sillanpää Mikko J
Rolf Nevanlinna Institute, P.O. Box 4, Fin-00014 University of Helsinki, Finland.
Bioinformatics. 2004 Oct 12;20(15):2363-9. doi: 10.1093/bioinformatics/bth250. Epub 2004 Apr 8.
Bayesian statistical methods based on simulation techniques have recently been shown to provide powerful tools for the analysis of genetic population structure. We have previously developed a Markov chain Monte Carlo (MCMC) algorithm for characterizing genetically divergent groups based on molecular markers and geographical sampling design of the dataset. However, for large-scale datasets such algorithms may get stuck to local maxima in the parameter space. Therefore, we have modified our earlier algorithm to support multiple parallel MCMC chains, with enhanced features that enable considerably faster and more reliable estimation compared to the earlier version of the algorithm. We consider also a hierarchical tree representation, from which a Bayesian model-averaged structure estimate can be extracted. The algorithm is implemented in a computer program that features a user-friendly interface and built-in graphics. The enhanced features are illustrated by analyses of simulated data and an extensive human molecular dataset.
Freely available at http://www.rni.helsinki.fi/~jic/bapspage.html.
基于模拟技术的贝叶斯统计方法最近已被证明可为遗传种群结构分析提供强大工具。我们之前开发了一种马尔可夫链蒙特卡罗(MCMC)算法,用于基于数据集的分子标记和地理采样设计来表征遗传分化群体。然而,对于大规模数据集,此类算法可能会在参数空间中陷入局部最大值。因此,我们对早期算法进行了修改,以支持多个并行的MCMC链,其增强功能使得与早期版本的算法相比,能够实现更快且更可靠的估计。我们还考虑了一种层次树表示,从中可以提取贝叶斯模型平均结构估计。该算法在一个具有用户友好界面和内置图形的计算机程序中实现。通过对模拟数据和大量人类分子数据集的分析来说明增强功能。