Institute of Zoology, Zoological Society of London, London, UK.
Mol Ecol Resour. 2018 Jan;18(1):41-54. doi: 10.1111/1755-0998.12708. Epub 2017 Sep 18.
Many molecular ecology analyses assume the genotyped individuals are sampled at random from a population and thus are representative of the population. Realistically, however, a sample may contain excessive close relatives (ECR) because, for example, localized juveniles are drawn from fecund species. Our knowledge is limited about how ECR affect the routinely conducted elementary genetics analyses, and how ECR are best dealt with to yield unbiased and accurate parameter estimates. This study quantifies the effects of ECR on some popular population genetics analyses of marker data, including the estimation of allele frequencies, F-statistics, expected heterozygosity (H ), effective and observed numbers of alleles, and the tests of Hardy-Weinberg equilibrium (HWE) and linkage equilibrium (LE). It also investigates several strategies for handling ECR to mitigate their impact and to yield accurate parameter estimates. My analytical work, assisted by simulations, shows that ECR have large and global effects on all of the above marker analyses. The naïve approach of simply ignoring ECR could yield low-precision and often biased parameter estimates, and could cause too many false rejections of HWE and LE. The bold approach, which simply identifies and removes ECR, and the cautious approach, which estimates target parameters (e.g., H ) by accounting for ECR and using naïve allele frequency estimates, eliminate the bias and the false HWE and LE rejections, but could reduce estimation precision substantially. The likelihood approach, which accounts for ECR in estimating allele frequencies and thus target parameters relying on allele frequencies, usually yields unbiased and the most accurate parameter estimates. Which of the four approaches is the most effective and efficient may depend on the particular marker analysis to be conducted. The results are discussed in the context of using marker data for understanding population properties and marker properties.
许多分子生态学分析假设所检测的个体是从群体中随机抽样的,因此能够代表该群体。然而,实际上,样本可能包含过多的近亲个体(ECR),因为例如,从繁殖力强的物种中采集的是局部幼体。我们对于 ECR 如何影响常规进行的基础遗传学分析,以及如何最好地处理 ECR 以获得无偏且准确的参数估计知之甚少。本研究量化了 ECR 对一些流行的基于标记数据的群体遗传学分析的影响,包括等位基因频率、F 统计量、预期杂合度(H)、有效和观察到的等位基因数以及 Hardy-Weinberg 平衡(HWE)和连锁平衡(LE)的检验。它还研究了几种处理 ECR 的策略,以减轻其影响并获得准确的参数估计。我的分析工作通过模拟辅助,表明 ECR 对所有上述标记分析都有很大且全局的影响。简单忽略 ECR 的天真方法可能会导致低精度且经常出现偏差的参数估计,并可能导致太多对 HWE 和 LE 的错误拒绝。大胆的方法,即简单地识别和去除 ECR,以及谨慎的方法,即通过考虑 ECR 并使用天真的等位基因频率估计来估计目标参数(例如,H),可以消除偏差和错误的 HWE 和 LE 拒绝,但可能会大大降低估计精度。在依赖等位基因频率估计等位基因频率和目标参数的情况下,似然方法通常会产生无偏且最准确的参数估计。四种方法中哪一种最有效和高效可能取决于要进行的特定标记分析。在使用标记数据了解群体性质和标记性质的背景下讨论了结果。