Arbisser Ilana M, Jewett Ethan M, Rosenberg Noah A
Department of Biology, Stanford University, Stanford, CA 94305, USA.
Departments of Electrical Engineering & Computer Science and Statistics, University of California, Berkeley, CA 94720, USA.
Theor Popul Biol. 2018 Jul;122:46-56. doi: 10.1016/j.tpb.2017.10.008. Epub 2017 Nov 10.
Many statistics that examine genetic variation depend on the underlying shapes of genealogical trees. Under the coalescent model, we investigate the joint distribution of two quantities that describe genealogical tree shape: tree height and tree length. We derive a recursive formula for their exact joint distribution under a demographic model of a constant-sized population. We obtain approximations for the mean and variance of the ratio of tree height to tree length, using them to show that this ratio converges in probability to 0 as the sample size increases. We find that as the sample size increases, the correlation coefficient for tree height and length approaches (π-6)∕[π2π-18]≈0.9340. Using simulations, we examine the joint distribution of height and length under demographic models with population growth and population subdivision. We interpret the joint distribution in relation to problems of interest in data analysis, including inference of the time to the most recent common ancestor. The results assist in understanding the influences of demographic histories on two fundamental features of tree shape.
许多用于研究基因变异的统计方法都依赖于系谱树的潜在形状。在溯祖模型下,我们研究了描述系谱树形状的两个量的联合分布:树高和树长。我们推导了在恒定大小种群的人口模型下它们精确联合分布的递归公式。我们得到了树高与树长之比的均值和方差的近似值,并用它们表明随着样本量的增加,这个比值依概率收敛到0。我们发现随着样本量的增加,树高和树长的相关系数趋近于(π - 6)∕[π²π - 18]≈0.9340。通过模拟,我们研究了在种群增长和种群细分的人口模型下树高和树长的联合分布。我们结合数据分析中感兴趣的问题来解释这种联合分布,包括对最近共同祖先时间的推断。这些结果有助于理解人口历史对树形状的两个基本特征的影响。