Department of Immunology, Imperial College London, London W2 1PG, United Kingdom.
Genetics. 2013 Jan;193(1):243-53. doi: 10.1534/genetics.112.145599. Epub 2012 Nov 12.
In recent years it has emerged that structural variants have a substantial impact on genomic variation. Inversion polymorphisms represent a significant class of structural variant, and despite the challenges in their detection, data on inversions in the human genome are increasing rapidly. Statistical methods for inferring parameters such as the recombination rate and the selection coefficient have generally been developed without accounting for the presence of inversions. Here we exploit new software for simulating inversions in population genetic data, invertFREGENE, to assess the potential impact of inversions on such methods. Using data simulated by invertFREGENE, as well as real data from several sources, we test whether large inversions have a disruptive effect on widely applied population genetics methods for inferring recombination rates, for detecting selection, and for controlling for population structure in genome-wide association studies (GWAS). We find that recombination rates estimated by LDhat are biased downward at inversion loci relative to the true contemporary recombination rates at the loci but that recombination hotspots are not falsely inferred at inversion breakpoints as may have been expected. We find that the integrated haplotype score (iHS) method for detecting selection appears robust to the presence of inversions. Finally, we observe a strong bias in the genome-wide results of principal components analysis (PCA), used to control for population structure in GWAS, in the presence of even a single large inversion, confirming the necessity to thin SNPs by linkage disequilibrium at large physical distances to obtain unbiased results.
近年来,结构变异对基因组变异的影响已逐渐显现。倒位多态性是一种重要的结构变异类型,尽管在检测方面存在挑战,但人类基因组中倒位的数据正在迅速增加。用于推断重组率和选择系数等参数的统计方法通常是在没有考虑到倒位存在的情况下开发的。在这里,我们利用用于模拟群体遗传数据中倒位的新软件 invertFREGENE,评估倒位对这些方法的潜在影响。使用 invertFREGENE 模拟的数据以及来自多个来源的真实数据,我们测试了大型倒位是否会对广泛应用于推断重组率、检测选择以及在全基因组关联研究(GWAS)中控制群体结构的流行群体遗传学方法产生破坏性影响。我们发现,与真实的当代重组率相比,LDhat 估计的重组率在倒位位点上存在向下偏差,但在倒位断点处并没有错误推断重组热点,这可能是出乎意料的。我们发现,用于检测选择的整合单倍型评分(iHS)方法似乎对倒位的存在具有鲁棒性。最后,我们观察到即使存在单个大型倒位,主成分分析(PCA)在 GWAS 中用于控制群体结构的全基因组结果也存在强烈的偏差,这证实了在获得无偏结果时,有必要通过大的物理距离上的连锁不平衡来对 SNP 进行薄化。