University of Nebraska-Lincoln, Lincoln, NE 68503.
University of Nebraska-Lincoln, Lincoln, NE 68503
G3 (Bethesda). 2019 Jul 9;9(7):2153-2160. doi: 10.1534/g3.119.400093.
Obtaining genome-wide genotype information for millions of SNPs in soybean [ (L.) Merr.] often involves completely resequencing a line at 5X or greater coverage. Currently, hundreds of soybean lines have been resequenced at high depth levels with their data deposited in the NCBI Short Read Archive. This publicly available dataset may be leveraged as an imputation reference panel in combination with skim (low coverage) sequencing of new soybean genotypes to economically obtain high-density SNP information. Ninety-nine soybean lines resequenced at an average of 17.1X were used to generate a reference panel, with over 10 million SNPs called using GATK's Haplotype Caller tool. Whole genome resequencing at approximately 1X depth was performed on 114 previously ungenotyped experimental soybean lines. Coverages down to 0.1X were analyzed by randomly subsetting raw reads from the original 1X sequence data. SNPs discovered in the reference panel were genotyped in the experimental lines after aligning to the soybean reference genome, and missing markers imputed using Beagle 4.1. Sequencing depth of the experimental lines could be reduced to 0.3X while still retaining an accuracy of 97.8%. Accuracy was inversely related to minor allele frequency, and highly correlated with marker linkage disequilibrium. The high accuracy of skim sequencing combined with imputation provides a low cost method for obtaining dense genotypic information that can be used for various genomics applications in soybean.
获得大豆 [ (L.) Merr.] 中数百万个 SNP 的全基因组基因型信息通常需要以 5X 或更高的覆盖率对一个品系进行完全重测序。目前,已有数百个大豆品系在高深度水平上进行了重测序,并将其数据存入 NCBI 短读序列存档库。这个公开的数据集可以作为一个 imputation 参考面板,与新大豆基因型的 skim(低覆盖率)测序结合使用,以经济地获得高密度 SNP 信息。99 个平均重测序深度为 17.1X 的大豆品系被用来生成参考面板,使用 GATK 的 Haplotype Caller 工具调用了超过 1000 万个 SNP。对 114 个以前未分型的实验性大豆品系进行了约 1X 深度的全基因组重测序。通过从原始 1X 序列数据中随机抽样原始读数,分析了低至 0.1X 的覆盖率。在将参考基因组与大豆参考基因组对齐后,在实验性品系中对参考面板中发现的 SNP 进行了基因分型,并使用 Beagle 4.1 对缺失的标记进行了 imputation。实验品系的测序深度可以降低到 0.3X,同时仍然保持 97.8%的准确性。准确性与次要等位基因频率成反比,与标记连锁不平衡高度相关。skim 测序与 imputation 的高准确性提供了一种获取密集基因型信息的低成本方法,可用于大豆中的各种基因组学应用。