Gundappa Manu Kumar, Robledo Diego, Hamilton Alastair, Houston Ross D, Prendergast James G D, Macqueen Daniel J
Animal Breeding and Genomics, Wageningen University & Research, P.O. Box 338, 6700 AH, Wageningen, The Netherlands.
The Roslin Institute, University of Edinburgh, Easter Bush Campus, Edinburgh, EH25 9RG, UK.
Genet Sel Evol. 2025 Mar 28;57(1):16. doi: 10.1186/s12711-025-00962-6.
Whole genome sequencing (WGS), despite its advantages, is yet to replace methods for genotyping single nucleotide variants (SNVs) such as SNP arrays and targeted genotyping assays. Structural variants (SVs) have larger effects on traits than SNVs, but are more challenging to accurately genotype. Using low-coverage WGS with genotype imputation offers a cost-effective strategy to achieve genome-wide variant coverage, but is yet to be tested for SVs.
Here, we investigate combined SNV and SV imputation with low-coverage WGS data in Atlantic salmon (Salmo salar). As the reference panel, we used genotypes for high-confidence SVs and SNVs for n = 365 wild individuals sampled from diverse populations. We also generated 15 × WGS data (n = 20 samples) for a commercial population external to the reference panel, and called SVs and SNVs with gold-standard approaches. An imputation method selected for its established performance using low-coverage sequencing data (GLIMPSE) was tested at WGS depths of 1 × , 2 × , 3 × , and 4 × for samples within and external to the reference panel.
SNVs were imputed with high accuracy and recall across all WGS depths, including for samples out-with the reference panel. For SVs, we compared imputation based purely on linkage disequilibrium (LD) with SNVs, to that supplemented with SV genotype likelihoods (GLs) from low-coverage WGS. Including SV GLs increased imputation accuracy, but as a trade-off with recall, requiring 3-4 × depth for best performance. Combining strategies allowed us to capture 84% of the reference panel deletions with 87% accuracy at 1 × depth. We also show that SV length affects imputation performance, with provision of SV GLs greatly enhancing accuracy for the longest SVs in the dataset.
This study highlights the promise of reference panel imputation using low-coverage WGS, including novel opportunities to enhance the resolution of genome-wide association studies by capturing SVs.
全基因组测序(WGS)尽管具有诸多优势,但尚未取代单核苷酸变异(SNV)基因分型方法,如SNP阵列和靶向基因分型检测。结构变异(SV)对性状的影响比SNV更大,但准确进行基因分型更具挑战性。使用低覆盖度WGS结合基因型填充提供了一种经济高效的策略来实现全基因组变异覆盖,但尚未针对SV进行测试。
在此,我们研究了大西洋鲑(Salmo salar)低覆盖度WGS数据中SNV和SV的联合填充。作为参考面板,我们使用了从不同种群中采样的n = 365个野生个体的高可信度SV和SNV的基因型。我们还为参考面板之外的一个商业种群生成了15×WGS数据(n = 20个样本),并使用金标准方法对SV和SNV进行了分型。选择一种因其在低覆盖度测序数据中已确立的性能而被采用的填充方法(GLIMPSE),在1×、2×、3×和4×的WGS深度下,对参考面板内外的样本进行测试。
在所有WGS深度下,包括参考面板之外的样本,SNV的填充都具有高精度和召回率。对于SV,我们将仅基于与SNV的连锁不平衡(LD)的填充与补充了低覆盖度WGS的SV基因型似然性(GL)的填充进行了比较。纳入SV GL提高了填充准确性,但作为召回率的权衡,最佳性能需要3 - 4×深度。组合策略使我们能够在1×深度下以87%的准确率捕获参考面板中84%的缺失。我们还表明,SV长度会影响填充性能,提供SV GL极大地提高了数据集中最长SV的准确性。
本研究突出了使用低覆盖度WGS进行参考面板填充的前景,包括通过捕获SV增强全基因组关联研究分辨率的新机会。