Chiu Chun-Huo, Wang Yi-Ting, Walther Bruno A, Chao Anne
Institute of Statistics, National Tsing Hua University, Hsin-Chu 30043, Taiwan.
Master Program in Global Health and Development, College of Public Health and Nutrition, Taipei Medical University, 250 Wu-Hsing St., Taipei 110, Taiwan.
Biometrics. 2014 Sep;70(3):671-82. doi: 10.1111/biom.12200. Epub 2014 Jun 19.
It is difficult to accurately estimate species richness if there are many almost undetectable species in a hyper-diverse community. Practically, an accurate lower bound for species richness is preferable to an inaccurate point estimator. The traditional nonparametric lower bound developed by Chao (1984, Scandinavian Journal of Statistics 11, 265-270) for individual-based abundance data uses only the information on the rarest species (the numbers of singletons and doubletons) to estimate the number of undetected species in samples. Applying a modified Good-Turing frequency formula, we derive an approximate formula for the first-order bias of this traditional lower bound. The approximate bias is estimated by using additional information (namely, the numbers of tripletons and quadrupletons). This approximate bias can be corrected, and an improved lower bound is thus obtained. The proposed lower bound is nonparametric in the sense that it is universally valid for any species abundance distribution. A similar type of improved lower bound can be derived for incidence data. We test our proposed lower bounds on simulated data sets generated from various species abundance models. Simulation results show that the proposed lower bounds always reduce bias over the traditional lower bounds and improve accuracy (as measured by mean squared error) when the heterogeneity of species abundances is relatively high. We also apply the proposed new lower bounds to real data for illustration and for comparisons with previously developed estimators.
在一个超多样的群落中,如果存在许多几乎无法检测到的物种,那么准确估计物种丰富度是很困难的。实际上,一个准确的物种丰富度下限比一个不准确的点估计量更可取。Chao(1984年,《斯堪的纳维亚统计杂志》11卷,265 - 270页)为基于个体的丰度数据开发的传统非参数下限仅使用最稀有物种的信息(单物种和双物种的数量)来估计样本中未检测到的物种数量。应用修正的古德 - 图灵频率公式,我们推导出了这个传统下限一阶偏差的近似公式。通过使用额外信息(即三物种和四物种的数量)来估计近似偏差。这个近似偏差可以被校正,从而得到一个改进的下限。所提出的下限在对任何物种丰度分布都普遍有效的意义上是非参数的。对于发生率数据也可以推导出类似类型的改进下限。我们在由各种物种丰度模型生成的模拟数据集上测试我们提出的下限。模拟结果表明,当物种丰度的异质性相对较高时,所提出的下限总是比传统下限减少偏差并提高准确性(以均方误差衡量)。我们还将提出的新下限应用于实际数据以作说明,并与先前开发的估计量进行比较。