Centre for Biological Diversity, University of St Andrews, Fife, UK.
Mol Ecol Resour. 2022 Aug;22(6):2183-2195. doi: 10.1111/1755-0998.13606. Epub 2022 Mar 24.
The measurement of biodiversity at all levels of organization is an essential first step to understand the ecological and evolutionary processes that drive spatial patterns of biodiversity. Ecologists have explored the use of a large range of different summary statistics and have come to the view that information-based summary statistics, and in particular so-called Hill numbers, are a useful tool to measure biodiversity. Population geneticists, on the other hand, have focused largely on summary statistics based on heterozygosity and measures of allelic richness. However, recent studies proposed the adoption of information-based summary statistics in population genetics studies. Here, we performed a comprehensive assessment of the power of this family of summary statistics to inform regarding spatial patterns of genetic diversity and we compared it with that of traditional population genetics approaches, namely measures based on allelic richness and heterozygosity. To give an unbiased evaluation, we used three machine learning methods to test the performance of different sets of summary statistics to discriminate between spatial scenarios. We defined three distinct sets, (i) one based on allelic richness measures which included the Jaccard index, (ii) a set based on heterozygosity that included F and (iii) a set based on Hill numbers derived from Shannon entropy, which included the recently proposed Shannon differentiation, ΔD. The results showed that the last of these performed as well or, under some specific spatial scenarios, even better than the traditional population genetics measures. Interestingly, we found that a rarely or never used genetic differentiation measure based on allelic richness, Jaccard dissimilarity (J), showed the highest discriminatory power to discriminate among spatial scenarios, followed by Shannon differentiation ΔD. We concluded, therefore, that information-based measures as well as Jaccard dissimilarity represent excellent additions to the population genetics toolkit.
测量各级组织的生物多样性是理解驱动生物多样性空间格局的生态和进化过程的必要的第一步。生态学家探索了使用大量不同的综合统计数据,并得出结论,基于信息的综合统计数据,特别是所谓的 Hill 数,是衡量生物多样性的有用工具。另一方面,群体遗传学家主要关注基于杂合性和等位基因丰富度的综合统计数据。然而,最近的研究提出在群体遗传学研究中采用基于信息的综合统计数据。在这里,我们全面评估了这组综合统计数据在提供遗传多样性空间格局信息方面的能力,并将其与传统的群体遗传学方法(即基于等位基因丰富度和杂合性的度量)进行了比较。为了进行无偏评估,我们使用了三种机器学习方法来测试不同综合统计数据集区分空间场景的性能。我们定义了三个不同的集合,(i)一个基于等位基因丰富度的集合,包括 Jaccard 指数,(ii)一个基于杂合性的集合,包括 F,(iii)一个基于 Shannon 熵衍生的 Hill 数的集合,包括最近提出的 Shannon 分化,ΔD。结果表明,最后一组的表现与传统的群体遗传学方法一样好,或者在某些特定的空间场景下甚至更好。有趣的是,我们发现一个很少或从未使用过的基于等位基因丰富度的遗传分化度量,Jaccard 不相似性(J),在区分空间场景方面具有最高的辨别力,其次是 Shannon 分化 ΔD。因此,我们得出结论,基于信息的度量以及 Jaccard 不相似性代表了群体遗传学工具包的极好补充。