Temple Seth D, Thompson Elizabeth A
Department of Statistics, University of Washington, Seattle, Washington, USA.
Department of Statistics, University of Michigan, Ann Arbor, Michigan, USA.
bioRxiv. 2025 Jan 7:2024.06.05.597656. doi: 10.1101/2024.06.05.597656.
If two haplotypes share the same alleles for an extended gene tract, these haplotypes are likely to be derived identical-by-descent from a recent common ancestor. Identity-by-descent segment lengths are correlated via unobserved ancestral tree and recombination processes, which commonly presents challenges to the derivation of theoretical results in population genetics. We show that the proportion of detectable identity-by-descent segments around a locus is normally distributed when the sample size and the scaled population size are large. We generalize this central limit theorem to cover flexible demographic scenarios, multi-way identity-by-descent segments, and multivariate identity-by-descent rates. We use efficient simulations to study the distributional behavior of the detectable identity-by-descent rate. One consequence of non-normality in finite samples is that a genome-wide scan looking for excess identity-by-descent rates may be subject to anti-conservative control of family-wise error rates.
如果两个单倍型在一段延伸的基因区域共享相同的等位基因,那么这些单倍型很可能是通过系谱从最近的共同祖先衍生而来的同宗同源。同宗同源片段的长度通过未观测到的祖先树和重组过程相互关联,这通常给群体遗传学理论结果的推导带来挑战。我们表明,当样本量和标度化群体大小较大时,一个位点周围可检测到的同宗同源片段的比例呈正态分布。我们将这个中心极限定理进行推广,以涵盖灵活的人口统计学情景、多途径同宗同源片段以及多变量同宗同源率。我们使用高效模拟来研究可检测到的同宗同源率的分布行为。有限样本中出现非正态性的一个后果是,全基因组扫描寻找过高的同宗同源率可能会受到家族性错误率的反保守控制。