Technow Frank, Gerke Justin
Maize Product Development/Systems and Innovation for Breeding and Seed Products, DuPont Pioneer, Tavistock, Ontario, Canada.
Systems and Innovation for Breeding and Seed Products, DuPont Pioneer, Johnston, Iowa, United States of America.
PLoS One. 2017 Dec 22;12(12):e0190271. doi: 10.1371/journal.pone.0190271. eCollection 2017.
The increased usage of whole-genome selection (WGS) and other molecular evaluation methods in plant breeding relies on the ability to genotype a very large number of untested individuals in each breeding cycle. Many plant breeding programs evaluate large biparental populations of homozygous individuals derived from homozygous parent inbred lines. This structure lends itself to parent-progeny imputation, which transfers the genotype scores of the parents to progeny individuals that are genotyped for a much smaller number of loci. Here we introduce a parent-progeny imputation method that infers individual genotypes from non-barcoded pooled samples of DNA of multiple individuals using a Hidden Markov Model (HMM). We demonstrate the method for pools of simulated maize double haploids (DH) from biparental populations, genotyped using a genotyping by sequencing (GBS) approach for 3,000 loci at 0.125x to 4x coverage. We observed high concordance between true and imputed marker scores and the HMM produced well-calibrated genotype probabilities that correctly reflected the uncertainty of the imputed scores. Genomic estimated breeding values (GEBV) calculated from the imputed scores closely matched GEBV calculated from the true marker scores. The within-population correlation between these sets of GEBV approached 0.95 at 1x and 4x coverage when pooling two or four individuals, respectively. Our approach can reduce the genotyping cost per individual by a factor up to the number of pooled individuals in GBS applications without the need for extra sequencing coverage, thereby enabling cost-effective large scale genotyping for applications such as WGS in plant breeding.
全基因组选择(WGS)和其他分子评估方法在植物育种中的使用增加,这依赖于在每个育种周期中对大量未经测试的个体进行基因分型的能力。许多植物育种计划评估来自纯合亲本自交系的纯合个体的大型双亲群体。这种结构适合亲子代基因型填充,即将亲本的基因型分数转移到仅对少数位点进行基因分型的子代个体上。在这里,我们介绍一种亲子代基因型填充方法,该方法使用隐马尔可夫模型(HMM)从多个个体的非条形码DNA混合样本中推断个体基因型。我们展示了该方法在来自双亲群体的模拟玉米双单倍体(DH)样本池中的应用,这些样本池使用测序基因分型(GBS)方法在0.125x至4x覆盖度下对3000个位点进行基因分型。我们观察到真实标记分数与填充标记分数之间具有高度一致性,并且HMM产生了校准良好的基因型概率,正确反映了填充分数的不确定性。根据填充分数计算的基因组估计育种值(GEBV)与根据真实标记分数计算的GEBV紧密匹配。当分别合并两个或四个个体时,在1x和4x覆盖度下,这两组GEBV之间的群体内相关性分别接近0.95。我们的方法可以将每个个体的基因分型成本降低多达GBS应用中合并个体数量的倍数,而无需额外的测序覆盖度,从而能够在植物育种中的WGS等应用中进行具有成本效益的大规模基因分型。