利用主成分分析亲缘关系信息 SNP 追踪牛品种。
Tracing cattle breeds with principal components analysis ancestry informative SNPs.
机构信息
Department of Computer Science, Rensselaer Polytechnic Institute, Troy, New York, United States of America.
出版信息
PLoS One. 2011 Apr 7;6(4):e18007. doi: 10.1371/journal.pone.0018007.
The recent release of the Bovine HapMap dataset represents the most detailed survey of bovine genetic diversity to date, providing an important resource for the design and development of livestock production. We studied this dataset, comprising more than 30,000 Single Nucleotide Polymorphisms (SNPs) for 19 breeds (13 taurine, three zebu, and three hybrid breeds), seeking to identify small panels of genetic markers that can be used to trace the breed of unknown cattle samples. Taking advantage of the power of Principal Components Analysis and algorithms that we have recently described for the selection of Ancestry Informative Markers from genomewide datasets, we present a decision-tree which can be used to accurately infer the origin of individual cattle. In doing so, we present a thorough examination of population genetic structure in modern bovine breeds. Performing extensive cross-validation experiments, we demonstrate that 250-500 carefully selected SNPs suffice in order to achieve close to 100% prediction accuracy of individual ancestry, when this particular set of 19 breeds is considered. Our methods, coupled with the dense genotypic data that is becoming increasingly available, have the potential to become a valuable tool and have considerable impact in worldwide livestock production. They can be used to inform the design of studies of the genetic basis of economically important traits in cattle, as well as breeding programs and efforts to conserve biodiversity. Furthermore, the SNPs that we have identified can provide a reliable solution for the traceability of breed-specific branded products.
最近发布的牛基因组单核苷酸多态性图谱数据集代表了迄今为止对牛遗传多样性的最详细调查,为设计和开发家畜生产提供了重要资源。我们研究了这个数据集,其中包含了 19 个品种(13 个瘤牛、3 个泽布牛和 3 个杂交品种)的 30000 多个单核苷酸多态性(SNP),旨在确定可以用来追踪未知牛样品品种的小型遗传标记面板。利用主成分分析的强大功能和我们最近描述的从全基因组数据集选择祖先信息标记的算法,我们提出了一个决策树,可以用来准确推断个体牛的起源。通过这样做,我们对现代牛品种的群体遗传结构进行了彻底的检查。通过广泛的交叉验证实验,我们证明,在考虑这 19 个特定品种时,只需精心选择 250-500 个 SNP 就足以实现个体祖先预测准确率接近 100%。我们的方法,加上越来越多的密集基因型数据,有可能成为一个有价值的工具,并对全球畜牧业生产产生重大影响。它们可以用于告知有关牛的经济重要性状的遗传基础研究、育种计划和保护生物多样性的设计。此外,我们确定的 SNP 可以为特定品种品牌产品的可追溯性提供可靠的解决方案。