Department of Biostatistics, University of Washington, Seattle, Washington 98195-7232, USA.
Genet Epidemiol. 2013 Sep;37(6):635-41. doi: 10.1002/gepi.21737. Epub 2013 Jun 5.
The proportion of the genome that is shared identical by descent (IBD) between pairs of individuals is often estimated in studies involving genome-wide SNP data. These estimates can be used to check pedigrees, estimate heritability, and adjust association analyses. We focus on the method of moments technique as implemented in PLINK [Purcell et al., 2007] and other software that estimates the proportions of the genome at which two individuals share 0, 1, or 2 alleles IBD. This technique is based on the assumption that the study sample is drawn from a single, homogeneous, randomly mating population. This assumption is violated if pedigree founders are drawn from multiple populations or include admixed individuals. In the presence of population structure, the method of moments estimator has an inflated variance and can be biased because it relies on sample-based allele frequency estimates. In the case of the PLINK estimator, which truncates genome-wide sharing estimates at zero and one to generate biologically interpretable results, the bias is most often towards over-estimation of relatedness between ancestrally similar individuals. Using simulated pedigrees, we are able to demonstrate and quantify the behavior of the PLINK method of moments estimator under different population structure conditions. We also propose a simple method based on SNP pruning for improving genome-wide IBD estimates when the assumption of a single, homogeneous population is violated.
在涉及全基因组 SNP 数据的研究中,通常会估计个体间共享完全相同的遗传(IBD)的基因组比例。这些估计可用于检查系谱、估计遗传率和调整关联分析。我们专注于 PLINK [Purcell 等人,2007] 中实现的矩法技术和其他软件,这些软件估计两个人共享 0、1 或 2 个等位基因 IBD 的基因组比例。该技术基于研究样本取自单一、同质、随机交配群体的假设。如果系谱创始人来自多个群体或包含混合个体,则违反了该假设。在存在群体结构的情况下,矩法估计量的方差会膨胀,并且可能存在偏差,因为它依赖于基于样本的等位基因频率估计。对于 PLINK 估计量,它会截断全基因组共享估计值为零和一,以生成具有生物学意义的结果,因此最常见的偏差是高估具有相似祖先的个体之间的亲缘关系。使用模拟系谱,我们能够在不同的群体结构条件下展示和量化 PLINK 矩法估计量的行为。我们还提出了一种基于 SNP 修剪的简单方法,用于在违反单一、同质群体假设的情况下改进全基因组 IBD 估计。