Gray G
Department of Statistics, North Carolina State University, Raleigh 27695-8203.
Biometrics. 1994 Jun;50(2):457-70.
A finite mixture is a distribution where a given observation can come from any of a finite set of components. That is, the density of the random variable X is of the form f(x) = pi 1f1(x) + pi 2f2(x) + ... + pi kfk(x), where the pi i are the mixing proportions and the fi are the component densities. Mixture models are common in many areas of biology; the most commonly applied is a mixture of normal densities. Many of the problems with inference in the mixture setting are well known. Not so well documented, however, are the extreme biases that can occur in the maximum likelihood estimators (MLEs) when there is model misspecification. This paper shows that even the seemingly innocuous assumption of equal variances for the components of the mixture can lead to surprisingly large asymptotic biases in the MLEs of the parameters. Assuming normality when the underlying distributions are skewed can also lead to strong biases. We explicitly calculate the asymptotic biases when maximum likelihood is carried out assuming normality for several types of true underlying distribution. If the true distribution is a mixture of skewed components, then an application of the Box-Cox power transformation can reduce the asymptotic bias substantially. The power lambda in the Box-Cox transformation is in this case treated as an additional parameter to be estimated. In many cases the bias can be reduced to acceptable levels, thus leading to meaningful inference. A modest Monte Carlo study gives an indication of the small-sample performance of inference procedures (including the power and level of likelihood ratio tests) based on a likelihood that incorporates estimation of lambda. A real data example illustrates the method.
有限混合分布是指给定观测值可能来自有限个成分集合中的任何一个的分布。也就是说,随机变量X的密度具有f(x) = π₁f₁(x) + π₂f₂(x) + ... + πₖfₖ(x)的形式,其中πᵢ是混合比例,fᵢ是成分密度。混合模型在生物学的许多领域都很常见;最常用的是正态密度的混合。在混合设置中进行推断时的许多问题都是众所周知的。然而,当存在模型误设时,最大似然估计器(MLE)中可能出现的极端偏差却没有得到很好的记录。本文表明,即使是混合成分等方差这一看似无害的假设,也会导致参数的MLE出现惊人的大渐近偏差。当基础分布是偏态时假设为正态分布也会导致强烈偏差。我们明确计算了在几种类型的真实基础分布下假设为正态分布进行最大似然估计时的渐近偏差。如果真实分布是偏态成分的混合,那么应用Box-Cox幂变换可以大幅降低渐近偏差。在这种情况下,Box-Cox变换中的幂λ被视为一个待估计的额外参数。在许多情况下,偏差可以降低到可接受的水平,从而实现有意义的推断。一项适度的蒙特卡罗研究给出了基于包含λ估计的似然性的推断程序(包括似然比检验的功效和水平)的小样本性能的指示。一个实际数据示例说明了该方法。