Department of Horticulture, Washington State University, Pullman, WA, United States of America.
Department of Horticultural Science, University of Minnesota, St Paul, MN, United States of America.
PLoS One. 2019 Jun 27;14(6):e0210928. doi: 10.1371/journal.pone.0210928. eCollection 2019.
High-quality genotypic data is a requirement for many genetic analyses. For any crop, errors in genotype calls, phasing of markers, linkage maps, pedigree records, and unnoticed variation in ploidy levels can lead to spurious marker-locus-trait associations and incorrect origin assignment of alleles to individuals. High-throughput genotyping requires automated scoring, as manual inspection of thousands of scored loci is too time-consuming. However, automated SNP scoring can result in errors that should be corrected to ensure recorded genotypic data are accurate and thereby ensure confidence in downstream genetic analyses. To enable quick identification of errors in a large genotypic data set, we have developed a comprehensive workflow. This multiple-step workflow is based on inheritance principles and on removal of markers and individuals that do not follow these principles, as demonstrated here for apple, peach, and sweet cherry. Genotypic data was obtained on pedigreed germplasm using 6-9K SNP arrays for each crop and a subset of well-performing SNPs was created using ASSIsT. Use of correct (and corrected) pedigree records readily identified violations of simple inheritance principles in the genotypic data, streamlined with FlexQTL software. Retained SNPs were grouped into haploblocks to increase the information content of single alleles and reduce computational power needed in downstream genetic analyses. Haploblock borders were defined by recombination locations detected in ancestral generations of cultivars and selections. Another round of inheritance-checking was conducted, for haploblock alleles (i.e., haplotypes). High-quality genotypic data sets were created using this workflow for pedigreed collections representing the U.S. breeding germplasm of apple, peach, and sweet cherry evaluated within the RosBREED project. These data sets contain 3855, 4005, and 1617 SNPs spread over 932, 103, and 196 haploblocks in apple, peach, and sweet cherry, respectively. The highly curated phased SNP and haplotype data sets, as well as the raw iScan data, of germplasm in the apple, peach, and sweet cherry Crop Reference Sets is available through the Genome Database for Rosaceae.
高质量的基因型数据是许多遗传分析的要求。对于任何作物来说,基因型调用、标记相位、连锁图谱、系谱记录中的错误以及倍性水平的未被注意到的变异,都可能导致标记-基因座-性状关联的虚假和等位基因个体起源分配的错误。高通量基因型检测需要自动化评分,因为手动检查数千个评分的基因座太耗时。然而,自动化 SNP 评分可能会导致错误,需要进行纠正,以确保记录的基因型数据是准确的,从而确保下游遗传分析的可信度。为了能够快速识别大型基因型数据集的错误,我们开发了一个综合工作流程。这个多步骤的工作流程基于遗传原理,去除不符合这些原理的标记和个体,正如我们在这里对苹果、桃和甜樱桃所展示的那样。使用每个作物的 6-9K SNP 阵列对系谱种质进行基因型数据获取,并使用 ASSIsT 创建一个性能良好的 SNP 子集。使用正确(和纠正)的系谱记录,可以很容易地识别基因型数据中简单遗传原理的违反情况,这与 FlexQTL 软件一起简化了流程。保留的 SNP 被分组到 haploblocks 中,以增加单个等位基因的信息量,并减少下游遗传分析所需的计算能力。haploblock 边界是通过在品种和选择的祖先世代中检测到的重组位置来定义的。然后对 haploblock 等位基因(即单倍型)进行了另一轮遗传检查。使用这个工作流程为美国苹果、桃和甜樱桃的育种种质创建了高质量的基因型数据集,这些种质是在 RosBREED 项目中进行评估的。这些数据集包含了 3855、4005 和 1617 个 SNP,分别分布在苹果、桃和甜樱桃的 932、103 和 196 个 haploblocks 中。苹果、桃和甜樱桃的种质的高度编辑的相位 SNP 和单倍型数据集,以及原始的 iScan 数据,可通过蔷薇科基因组数据库获得。