Chao Anne, Jost Lou, Hsieh T C, Ma K H, Sherwin William B, Rollins Lee Ann
Institute of Statistics, National Tsing Hua University, Hsin-Chu, Taiwan.
EcoMinga Foundation, Via a Runtun, Baños, Tungurahua, Ecuador.
PLoS One. 2015 Jun 11;10(6):e0125471. doi: 10.1371/journal.pone.0125471. eCollection 2015.
Shannon entropy H and related measures are increasingly used in molecular ecology and population genetics because (1) unlike measures based on heterozygosity or allele number, these measures weigh alleles in proportion to their population fraction, thus capturing a previously-ignored aspect of allele frequency distributions that may be important in many applications; (2) these measures connect directly to the rich predictive mathematics of information theory; (3) Shannon entropy is completely additive and has an explicitly hierarchical nature; and (4) Shannon entropy-based differentiation measures obey strong monotonicity properties that heterozygosity-based measures lack. We derive simple new expressions for the expected values of the Shannon entropy of the equilibrium allele distribution at a neutral locus in a single isolated population under two models of mutation: the infinite allele model and the stepwise mutation model. Surprisingly, this complex stochastic system for each model has an entropy expressable as a simple combination of well-known mathematical functions. Moreover, entropy- and heterozygosity-based measures for each model are linked by simple relationships that are shown by simulations to be approximately valid even far from equilibrium. We also identify a bridge between the two models of mutation. We apply our approach to subdivided populations which follow the finite island model, obtaining the Shannon entropy of the equilibrium allele distributions of the subpopulations and of the total population. We also derive the expected mutual information and normalized mutual information ("Shannon differentiation") between subpopulations at equilibrium, and identify the model parameters that determine them. We apply our measures to data from the common starling (Sturnus vulgaris) in Australia. Our measures provide a test for neutrality that is robust to violations of equilibrium assumptions, as verified on real world data from starlings.
香农熵H及相关度量在分子生态学和群体遗传学中的应用越来越广泛,原因如下:(1)与基于杂合性或等位基因数量的度量不同,这些度量根据等位基因在群体中的比例来权衡它们,从而捕捉到等位基因频率分布中一个以前被忽视的方面,这在许多应用中可能很重要;(2)这些度量直接与信息论中丰富的预测数学相关联;(3)香农熵是完全可加的,具有明确的层次性质;(4)基于香农熵的分化度量遵循基于杂合性的度量所缺乏的强单调性属性。我们在两种突变模型下,推导出了单个隔离群体中中性位点平衡等位基因分布的香农熵期望值的简单新表达式:无限等位基因模型和逐步突变模型。令人惊讶的是,每个模型的这个复杂随机系统都有一个可以表示为著名数学函数简单组合的熵。此外,每个模型基于熵和杂合性的度量通过简单的关系联系起来,模拟结果表明,即使远离平衡,这些关系也大致有效。我们还确定了两种突变模型之间的桥梁。我们将我们的方法应用于遵循有限岛屿模型的细分群体,得到了亚群体和总群体平衡等位基因分布的香农熵。我们还推导出了平衡时亚群体之间的期望互信息和归一化互信息(“香农分化”),并确定了决定它们的模型参数。我们将我们的度量应用于澳大利亚家八哥(Sturnus vulgaris)的数据。我们的度量提供了一种对中性的检验,这种检验对违反平衡假设具有鲁棒性,正如在八哥的实际数据上所验证的那样。