Konopiński Maciej K
Institute of Nature Conservation, Polish Academy of Sciences, Kraków, Poland.
PeerJ. 2020 Jun 29;8:e9391. doi: 10.7717/peerj.9391. eCollection 2020.
The Shannon diversity index has been widely used in population genetics studies. Recently, it was proposed as a unifying measure of diversity at different levels-from genes and populations to whole species and ecosystems. The index, however, was proven to be negatively biased at small sample sizes. Modifications to the original Shannon's formula have been proposed to obtain an unbiased estimator.
In this study, the performance of four different estimators of Shannon index-the original Shannon's formula and those of Zahl, Chao and Shen and Chao et al.-was tested on simulated microsatellite data. Both the simulation and analysis of the results were performed in the R language environment. A new R function was created for the calculation of all four indices from the genind data format.
Sample size dependence was detected in all the estimators analysed; however, the deviation from parametric values was substantially smaller in the derived measures than in the original Shannon's formula. Error rate was negatively associated with population heterozygosity. Comparisons among loci showed that fast-mutating loci were less affected by the error, except for the original Shannon's estimator which, in the smallest sample, was more strongly affected by loci with a higher number of alleles. The Zahl and Chao et al. estimators performed notably better than the original Shannon's formula.
The results of this study show that the original Shannon index should no longer be used as a measure of genetic diversity and should be replaced by Zahl's unbiased estimator.
香农多样性指数已在群体遗传学研究中广泛应用。最近,它被提议作为一种统一的多样性度量指标,可用于从基因、群体到整个物种和生态系统的不同层面。然而,该指数在小样本量时被证明存在负偏差。已有人提出对原始香农公式进行修正以获得无偏估计量。
在本研究中,对香农指数的四种不同估计量——原始香农公式以及扎尔、赵和沈以及赵等人提出的公式——在模拟微卫星数据上进行了性能测试。结果的模拟和分析均在R语言环境中进行。创建了一个新的R函数,用于从genind数据格式计算所有这四个指数。
在所分析的所有估计量中均检测到样本量依赖性;然而,与参数值的偏差在推导的度量中比在原始香农公式中要小得多。错误率与群体杂合度呈负相关。位点间的比较表明,快速突变的位点受误差影响较小,除了原始香农估计量,在最小样本中,它受等位基因数量较多的位点影响更大。扎尔和赵等人的估计量表现明显优于原始香农公式。
本研究结果表明,原始香农指数不应再用作遗传多样性的度量指标,而应以扎尔的无偏估计量取而代之。