Waples R K, Larson W A, Waples R S
School of Aquatic and Fishery Sciences, University of Washington, Seattle, WA, USA.
Northwest Fisheries Science Center, National Marine Fisheries Service, National Oceanic and Atmospheric Administration, Seattle, WA, USA.
Heredity (Edinb). 2016 Oct;117(4):233-40. doi: 10.1038/hdy.2016.60. Epub 2016 Aug 24.
Contemporary effective population size (Ne) can be estimated using linkage disequilibrium (LD) observed across pairs of loci presumed to be selectively neutral and unlinked. This method has been commonly applied to data sets containing 10-100 loci to inform conservation and study population demography. Performance of these Ne estimates could be improved by incorporating data from thousands of loci. However, these thousands of loci exist on a limited number of chromosomes, ensuring that some fraction will be physically linked. Linked loci have elevated LD due to limited recombination, which if not accounted for can cause Ne estimates to be downwardly biased. Here, we present results from coalescent and forward simulations designed to evaluate the bias of LD-based Ne estimates ([Ncirc ]e). Contrary to common perceptions, increasing the number of loci does not increase the magnitude of linkage. Although we show it is possible to identify some pairs of loci that produce unusually large r(2) values, simply removing large r(2) values is not a reliable way to eliminate bias. Fortunately, the magnitude of bias in [Ncirc ]e is strongly and negatively correlated with the process of recombination, including the number of chromosomes and their length, and this relationship provides a general way to adjust for bias. Additionally, we show that with thousands of loci, precision of [Ncirc ]e is much lower than expected based on the assumption that each pair of loci provides completely independent information.
当代有效种群大小(Ne)可以通过观察假定为选择性中性且不连锁的基因座对之间的连锁不平衡(LD)来估计。该方法已普遍应用于包含10 - 100个基因座的数据集,以指导保护工作和研究种群统计学。通过纳入数千个基因座的数据,这些Ne估计值的性能可能会得到改善。然而,这数千个基因座存在于有限数量的染色体上,这确保了其中一部分会在物理上连锁。由于重组受限,连锁的基因座具有升高的LD,如果不考虑这一点,可能会导致Ne估计值出现向下偏差。在这里,我们展示了为评估基于LD的Ne估计值([Ncirc ]e)的偏差而设计的合并和正向模拟结果。与普遍看法相反,增加基因座数量并不会增加连锁的程度。虽然我们表明有可能识别出一些产生异常大的r(2)值的基因座对,但简单地去除大的r(2)值并不是消除偏差的可靠方法。幸运的是,[Ncirc ]e中的偏差大小与重组过程,包括染色体数量及其长度,呈强烈的负相关,这种关系提供了一种调整偏差的通用方法。此外,我们表明,对于数千个基因座,基于每对基因座提供完全独立信息的假设,[Ncirc ]e的精度远低于预期。