Department of Statistics and Data Sciences, National University of Singapore, Singapore 117546, Singapore.
Biocomplexity Institute, University of Virginia, Charlottesville, 22911, USA.
J Theor Biol. 2024 May 7;584:111794. doi: 10.1016/j.jtbi.2024.111794. Epub 2024 Mar 16.
Tree shape statistics based on peripheral structures have been utilized to study evolutionary mechanisms and inference methods. Partially motivated by a recent study by Pouryahya and Sankoff on modeling the accumulation of subgenomes in the evolution of polyploids, we present the distribution of subtree patterns with four or fewer leaves for the unrooted Proportional to Distinguishable Arrangements (PDA) model. We derive a recursive formula for computing the joint distributions, as well as a Strong Law of Large Numbers and a Central Limit Theorem for the joint distributions. This enables us to confirm several conjectures proposed by Pouryahya and Sankoff, as well as provide some theoretical insights into their observations. Based on their empirical datasets, we demonstrate that the statistical test based on the joint distribution could be more sensitive than those based on one individual subtree pattern to detect the existence of evolutionary forces such as whole genome duplication.
基于外围结构的树形状统计已被用于研究进化机制和推理方法。部分受到 Pouryahya 和 Sankoff 最近关于建模多倍体进化中亚基因组积累的研究的启发,我们呈现了无根比例可区分排列(PDA)模型中具有四个或更少叶子的子树模式分布。我们推导出了计算联合分布的递归公式,以及联合分布的大数定律和中心极限定理。这使我们能够验证 Pouryahya 和 Sankoff 提出的几个猜想,并为他们的观察结果提供一些理论见解。基于他们的经验数据集,我们证明基于联合分布的统计检验可能比基于单个子树模式的检验更敏感,能够检测到全基因组复制等进化力量的存在。