Sun M, Jobling M A, Taliun D, Pramstaller P P, Egeland T, Sheehan N A
Department of Health Sciences, University of Leicester, UK.
Department of Genetics, University of Leicester, UK.
Theor Popul Biol. 2016 Feb;107:14-25. doi: 10.1016/j.tpb.2015.10.002. Epub 2015 Oct 16.
There has been recent interest in the exploitation of readily available dense genome scan marker data for the identification of relatives. However, there are conflicting findings on how informative these data are in practical situations and, in particular, sets of thinned markers are often used with no concrete justification for the chosen spacing. We explore the potential usefulness of dense single nucleotide polymorphism (SNP) arrays for this application with a focus on inferring distant relative pairs. We distinguish between relationship estimation, as defined by a pedigree connecting the two individuals of interest, and estimation of general relatedness as would be provided by a kinship coefficient or a coefficient of relatedness. Since our primary interest is in the former case, we adopt a pedigree likelihood approach. We consider the effect of additional SNPs and data on an additional typed relative, together with choice of that relative, on relationship inference. We also consider the effect of linkage disequilibrium. When overall relatedness, rather than the specific relationship, would suffice, we propose an approximate approach that is easy to implement and appears to compete well with a popular moment-based estimator and a recent maximum likelihood approach based on chromosomal sharing. We conclude that denser marker data are more informative for distant relatives. However, linkage disequilibrium cannot be ignored and will be the main limiting factor for applications to real data.
最近,人们对利用现有的密集基因组扫描标记数据来识别亲属产生了兴趣。然而,对于这些数据在实际情况中的信息量大小,存在相互矛盾的研究结果,特别是在使用经过筛选的标记集时,往往没有为所选的间距提供具体的依据。我们探讨了密集单核苷酸多态性(SNP)阵列在该应用中的潜在用途,重点是推断远亲对。我们区分了由连接两个感兴趣个体的系谱所定义的亲缘关系估计,以及由亲缘系数或相关系数所提供的一般相关性估计。由于我们主要关注的是前一种情况,所以我们采用系谱似然法。我们考虑了额外的SNP和数据对另一个分型亲属的影响,以及该亲属的选择对亲缘关系推断的影响。我们还考虑了连锁不平衡的影响。当总体相关性而非具体关系就足够时,我们提出了一种易于实施的近似方法,该方法似乎能与一种流行的基于矩的估计器以及最近基于染色体共享的最大似然方法相媲美。我们得出结论,更密集的标记数据对于远亲更具信息量。然而,连锁不平衡不能被忽视,它将成为应用于实际数据的主要限制因素。