Department of Neurobiology, Harvard Medical School, Boston, Massachusetts, United States of America.
PLoS One. 2012;7(2):e30906. doi: 10.1371/journal.pone.0030906. Epub 2012 Feb 17.
The problem of genotyping polyploids is extremely important for the creation of genetic maps and assembly of complex plant genomes. Despite its significance, polyploid genotyping still remains largely unsolved and suffers from a lack of statistical formality. In this paper a graphical bayesian model for SNP genotyping data is introduced. This model can infer genotypes even when the ploidy of the population is unknown. We also introduce an algorithm for finding the exact maximum a posteriori genotype configuration with this model. This algorithm is implemented in a freely available web-based software package SuperMASSA. We demonstrate the utility, efficiency, and flexibility of the model and algorithm by applying them to two different platforms, each of which is applied to a polyploid data set: Illumina GoldenGate data from potato and Sequenom MassARRAY data from sugarcane. Our method achieves state-of-the-art performance on both data sets and can be trivially adapted to use models that utilize prior information about any platform or species.
多倍体基因分型问题对于构建遗传图谱和组装复杂植物基因组至关重要。尽管其意义重大,但多倍体基因分型仍然在很大程度上尚未得到解决,并且缺乏统计形式。本文介绍了一种用于 SNP 基因分型数据的图形贝叶斯模型。该模型即使在未知群体倍性的情况下也可以推断基因型。我们还介绍了一种使用该模型找到精确最大后验基因型配置的算法。该算法在一个免费提供的基于网络的软件包 SuperMASSA 中实现。我们通过将其应用于两个不同的平台来证明模型和算法的实用性、效率和灵活性,每个平台都应用于一个多倍体数据集:来自马铃薯的 Illumina GoldenGate 数据和来自甘蔗的 Sequenom MassARRAY 数据。我们的方法在两个数据集上都达到了最先进的性能,并且可以轻而易举地适应使用任何平台或物种的先验信息的模型。