School of Aquatic and Fishery Sciences, University of Washington, Seattle, Washington, USA.
Mol Ecol Resour. 2024 Jan;24(1):e13879. doi: 10.1111/1755-0998.13879. Epub 2023 Oct 24.
The method to estimate contemporary effective population size (N ) based on patterns of linkage disequilibrium (LD) at unlinked loci has been widely applied to natural and managed populations. The underlying model makes many simplifying assumptions, most of which have been evaluated in numerous studies published over the last two decades. Here, these performance evaluations are reviewed and summarized, with a focus on information that facilitates practical application to real populations in nature. Potential sources of bias that are discussed include calculation of r (a measure of LD), adjustments for sampling error, physical linkage, age structure, migration and spatial structure, mutation and selection, mating systems, changes in abundance, rare alleles, missing data, genotyping errors, data filtering choices and methods for combining multiple N estimates. Factors that affect precision are reviewed, including pseudoreplication that limits the information gained from large genomics datasets, constraints imposed by small samples of individuals, and the challenges in obtaining robust estimates for large populations. Topics that merit further research include the potential to weight r values by allele frequency, lump samples of individuals, use genotypic likelihoods rather than called genotypes, prune large LD values and apply the method to species practising partial monogamy.
基于连锁不平衡(LD)模式估计当代有效种群大小(N)的方法已被广泛应用于自然和管理种群。该基础模型做出了许多简化假设,其中大多数假设已在过去二十年中发表的众多研究中进行了评估。在这里,我们回顾和总结了这些性能评估,重点是为实际自然种群的实际应用提供信息。讨论的潜在偏差来源包括 r(衡量 LD 的指标)的计算、抽样误差、物理连锁、年龄结构、迁移和空间结构、突变和选择、交配系统、丰度变化、稀有等位基因、缺失数据、基因分型错误、数据过滤选择以及组合多个 N 估计的方法。我们还回顾了影响精度的因素,包括限制从大型基因组数据集获得信息的伪重复、个体小样本带来的限制,以及为大型种群获得稳健估计的挑战。值得进一步研究的主题包括按等位基因频率加权 r 值、汇总个体样本、使用基因型似然而不是调用基因型、修剪大 LD 值以及将该方法应用于实行部分一夫一妻制的物种。