Paynter Alex, Willis Amy D
Department of Biostatistics, University of Washington, Health Sciences Building, Box 357232, 1705 NE Pacific St., Seattle, WA 98195.
J Appl Stat. 2021;48(6):1053-1070. doi: 10.1080/02664763.2020.1754359. Epub 2020 Apr 19.
Our goal is to estimate the true number of classes in a population, called the species richness. We consider the case where multiple frequency count tables have been collected from a homogeneous population, and investigate a penalized maximum likelihood estimator under a negative binomial model. Because high probabilities of unobserved classes increase the variance of species richness estimates, our method penalizes the probability of a class being unobserved. Tuning the penalization parameter is challenging because the true species richness is never known, and so we propose and validate four novel methods for tuning the penalization parameter. We illustrate and contrast the performance of the proposed methods by estimating the strain-level microbial diversity of Lake Champlain over 3 consecutive years, and global human host-associated species-level microbial richness.
我们的目标是估计总体中的真实类别数量,即物种丰富度。我们考虑从同质总体中收集了多个频数计数表的情况,并研究负二项式模型下的惩罚最大似然估计器。由于未观察到的类别的高概率会增加物种丰富度估计的方差,我们的方法对类未被观察到的概率进行惩罚。调整惩罚参数具有挑战性,因为真实的物种丰富度永远是未知的,因此我们提出并验证了四种调整惩罚参数的新方法。我们通过估计尚普兰湖连续3年的菌株水平微生物多样性以及全球人类宿主相关物种水平的微生物丰富度,来说明并对比所提出方法的性能。