Department of Horticulture, Washington State University, Pullman, WA, United States of America.
PLoS One. 2023 Feb 7;18(2):e0272888. doi: 10.1371/journal.pone.0272888. eCollection 2023.
Breeders, collection curators, and other germplasm users require genetic information, both genome-wide and locus-specific, to effectively manage their genetically diverse plant material. SNP arrays have become the preferred platform to provide genome-wide genetic profiles for elite germplasm and could also provide locus-specific genotypic information. However, genotypic information for loci of interest such as those within PCR-based DNA fingerprinting panels and trait-predictive DNA tests is not readily extracted from SNP array data, thus creating a disconnect between historic and new data sets. This study aimed to establish a method for deducing genotypes at loci of interest from their associated SNP haplotypes, demonstrated for two fruit crops and three locus types: quantitative trait loci Ma and Ma3 for acidity in apple, apple fingerprinting microsatellite marker GD12, and Mendelian trait locus Rf for sweet cherry fruit color. Using phased data from an apple 8K SNP array and sweet cherry 6K SNP array, unique haplotypes spanning each target locus were associated with alleles of important breeding parents. These haplotypes were compared via identity-by-descent (IBD) or identity-by-state (IBS) to haplotypes present in germplasm important to U.S. apple and cherry breeding programs to deduce target locus alleles in this germplasm. While IBD segments were confidently tracked through pedigrees, confidence in allele identity among IBS segments used a shared length threshold. At least one allele per locus was deduced for 64-93% of the 181 individuals. Successful validation compared deduced Rf and GD12 genotypes with reported and newly obtained genotypes. Our approach can efficiently merge and expand genotypic data sets, deducing missing data and identifying errors, and is appropriate for any crop with SNP array data and historic genotypic data sets, especially where linkage disequilibrium is high. Locus-specific genotypic information extracted from genome-wide SNP data is expected to enhance confidence in management of genetic resources.
培育者、收藏管理员和其他种质资源使用者需要遗传信息,包括全基因组和特定基因座的信息,以便有效地管理其遗传多样性的植物材料。SNP 阵列已成为提供优秀种质资源全基因组遗传特征的首选平台,也可以提供特定基因座的基因型信息。然而,对于基于 PCR 的 DNA 指纹图谱和性状预测性 DNA 测试等感兴趣基因座的基因型信息,无法从 SNP 阵列数据中轻易提取,从而导致历史数据集和新数据集之间存在脱节。本研究旨在建立一种从相关 SNP 单倍型推断感兴趣基因座基因型的方法,在两种水果作物和三种基因座类型中进行了演示:苹果酸度的数量性状基因座 Ma 和 Ma3、苹果指纹微卫星标记 GD12 和甜樱桃果实颜色的孟德尔性状基因座 Rf。利用苹果 8K SNP 阵列和甜樱桃 6K SNP 阵列的相位数据,跨越每个目标基因座的独特单倍型与重要育种亲本的等位基因相关联。通过同源关系(IBD)或等位基因状态同源性(IBS)将这些单倍型与美国苹果和樱桃育种计划中重要种质的单倍型进行比较,以推断这些种质中的目标基因座等位基因。虽然 IBD 片段可以通过系谱有信心地追踪,但 IBS 片段之间等位基因身份的置信度使用共享长度阈值。在 181 个个体中,至少有一个等位基因可以推断出 64-93%的个体。成功的验证比较了推断的 Rf 和 GD12 基因型与报告的和新获得的基因型。我们的方法可以有效地合并和扩展基因型数据集,推断缺失数据并识别错误,并且适用于任何具有 SNP 阵列数据和历史基因型数据集的作物,特别是在连锁不平衡程度较高的情况下。从全基因组 SNP 数据中提取的特定基因座基因型信息有望增强遗传资源管理的信心。