单向计数数据方差分析中过离散参数的区间估计。

Department of Mathematical Sciences, Central Connecticut State University, 1615 Stanley Street, New Britain, CT 06050, USA.

Stat Med. 2011 Jan 15;30(1):39-51. doi: 10.1002/sim.4061. Epub 2010 Sep 14.

The over-dispersion parameter is an important and versatile measure in the analysis of one-way layout of count data in biological studies. For example, it is commonly used as an inverse measure of aggregation in biological count data. Its estimation from finite data sets is a recognized challenge. Many simulation studies have examined the bias and efficiency of different estimators of the over-dispersion parameter for finite data sets (see, for example, Clark and Perry, Biometrics 1989; 45:309-316 and Piegorsch, Biometrics 1990; 46:863-867), but little attention has been paid to the accuracy of the confidence intervals (CIs) of it. In this paper, we first derive asymptotic procedures for the construction of confidence limits for the over-dispersion parameter using four estimators that are specified by only the first two moments of the counts. We also obtain closed-form asymptotic variance formulae for these four estimators. In addition, we consider the asymptotic CI based on the maximum likelihood (ML) estimator using the negative binomial model. It appears from the simulation results that the asymptotic CIs based on these five estimators have coverage below the nominal coverage probability. To remedy this, we also study the properties of the asymptotic CIs based on the restricted estimates of ML, extended quasi-likelihood, and double extended quasi-likelihood by eliminating the nuisance parameter effect using their adjusted profile likelihood and quasi-likelihoods. It is shown that these CIs outperform the competitors by providing coverage levels close to nominal over a wide range of parameter combinations. Two examples to biological count data are presented.

过度离散参数是生物研究中单向计数数据分析的一个重要且多功能的度量。例如，它通常被用作生物计数数据聚集度的逆度量。从有限的数据集估计其值是一个公认的挑战。许多模拟研究已经检验了不同的有限数据集过度离散参数估计量的偏倚和效率（例如，参见 Clark 和 Perry，Biometrics 1989；45:309-316 和 Piegorsch，Biometrics 1990；46:863-867），但很少关注其置信区间（CIs）的准确性。在本文中，我们首先使用仅由计数的前两个矩指定的四个估计量，为过度离散参数的置信限构造渐近程序。我们还为这四个估计量获得了封闭形式的渐近方差公式。此外，我们还考虑了基于负二项模型的最大似然（ML）估计量的渐近 CI。模拟结果表明，基于这五个估计量的渐近 CI 的覆盖范围低于名义覆盖概率。为了弥补这一点，我们还通过使用调整后的轮廓似然和拟似然来消除多余参数的影响，研究了基于 ML 的受限估计、扩展拟似然和双扩展拟似然的渐近 CI 的性质。结果表明，这些 CI 在广泛的参数组合下提供了接近名义的覆盖水平，性能优于竞争对手。我们展示了两个生物计数数据的示例。