Shchur Vladimir, Nielsen Rasmus
Departments of Integrative Biology and Statistics, University of California Berkeley, 4098 Valley Life Sciences Building (VLSB), Berkeley, CA, 94720-3140, USA.
Museum of Natural History, University of Copenhagen, Øster Voldgade 5-7, 1350, Copenhagen, Denmark.
J Math Biol. 2018 Nov;77(5):1279-1298. doi: 10.1007/s00285-018-1252-8. Epub 2018 Jun 6.
The number of individuals in a random sample with close relatives in the sample is a quantity of interest when designing Genome Wide Association Studies and other cohort based genetic, and non-genetic, studies. In this paper, we develop expressions for the distribution and expectation of the number of p-th cousins in a sample from a population of size N under two diploid Wright-Fisher models. We also develop simple asymptotic expressions for large values of N. For example, the expected proportion of individuals with at least one p-th cousin in a sample of K individuals, for a diploid dioecious Wright-Fisher model, is approximately [Formula: see text]. Our results show that a substantial fraction of individuals in the sample will have at least a second cousin if the sampling fraction (K / N) is on the order of [Formula: see text]. This confirms that, for large cohort samples, relatedness among individuals cannot easily be ignored.
在设计全基因组关联研究以及其他基于队列的遗传学和非遗传学研究时,样本中具有近亲的个体数量是一个值得关注的量。在本文中,我们推导了在两种二倍体赖特 - 费希尔模型下,从大小为(N)的总体中抽取的样本中第(p)代表亲数量的分布和期望的表达式。我们还为(N)的大值推导了简单的渐近表达式。例如,对于二倍体雌雄异株赖特 - 费希尔模型,在(K)个个体的样本中至少有一个第(p)代表亲的个体的期望比例约为[公式:见原文]。我们的结果表明,如果抽样比例((K / N))约为[公式:见原文],则样本中的很大一部分个体将至少有一个二代表亲。这证实了,对于大型队列样本,个体之间的亲缘关系不容忽视。