Shi Fan, Tibbits Josquin, Pasam Raj K, Kay Pippa, Wong Debbie, Petkowski Joanna, Forrest Kerrie L, Hayes Ben J, Akhunova Alina, Davies John, Webb Steven, Spangenberg German C, Akhunov Eduard, Hayden Matthew J, Daetwyler Hans D
Agriculture Victoria, Agriculture Research Division, AgriBio, Centre for AgriBioscience, Bundoora, VIC, Australia.
School of Applied Systems Biology, La Trobe University, Bundoora, VIC, Australia.
Theor Appl Genet. 2017 Jul;130(7):1393-1404. doi: 10.1007/s00122-017-2895-3. Epub 2017 Apr 4.
Imputing genotypes from the 90K SNP chip to exome sequence in wheat was moderately accurate. We investigated the factors that affect imputation and propose several strategies to improve accuracy. Imputing genetic marker genotypes from low to high density has been proposed as a cost-effective strategy to increase the power of downstream analyses (e.g. genome-wide association studies and genomic prediction) for a given budget. However, imputation is often imperfect and its accuracy depends on several factors. Here, we investigate the effects of reference population selection algorithms, marker density and imputation algorithms (Beagle4 and FImpute) on the accuracy of imputation from low SNP density (9K array) to the Infinium 90K single-nucleotide polymorphism (SNP) array for a collection of 837 hexaploid wheat Watkins landrace accessions. Based on these results, we then used the best performing reference selection and imputation algorithms to investigate imputation from 90K to exome sequence for a collection of 246 globally diverse wheat accessions. Accession-to-nearest-entry and genomic relationship-based methods were the best performing selection algorithms, and FImpute resulted in higher accuracy and was more efficient than Beagle4. The accuracy of imputing exome capture SNPs was comparable to imputing from 9 to 90K at approximately 0.71. This relatively low imputation accuracy is in part due to inconsistency between 90K and exome sequence formats. We also found the accuracy of imputation could be substantially improved to 0.82 when choosing an equivalent number of exome SNP, instead of 90K SNPs on the existing array, as the lower density set. We present a number of recommendations to increase the accuracy of exome imputation.
将小麦中90K SNP芯片的基因型推算至外显子序列的准确性适中。我们研究了影响推算的因素,并提出了几种提高准确性的策略。从低密度到高密度推算遗传标记基因型已被提出作为一种经济有效的策略,以在给定预算下提高下游分析(如全基因组关联研究和基因组预测)的效能。然而,推算往往并不完美,其准确性取决于几个因素。在此,我们研究了参考群体选择算法、标记密度和推算算法(Beagle4和FImpute)对837份六倍体小麦沃特金斯地方品种样本从低SNP密度(9K阵列)推算至Infinium 90K单核苷酸多态性(SNP)阵列准确性的影响。基于这些结果,我们随后使用表现最佳的参考选择和推算算法,对246份全球不同小麦样本从90K推算至外显子序列进行了研究。 accession-to-nearest-entry和基于基因组关系的方法是表现最佳的选择算法,FImpute的准确性更高,且比Beagle4更高效。推算外显子捕获SNP的准确性与从9K推算至90K的准确性相当,约为0.71。这种相对较低的推算准确性部分归因于90K和外显子序列格式之间的不一致。我们还发现,当选择等量的外显子SNP而非现有阵列上的90K SNP作为低密度集时,推算准确性可大幅提高至0.82。我们提出了一些提高外显子推算准确性的建议。