Song Xin-Yuan, Pan Jun-Hao, Kwok Timothy, Vandenput Liesbeth, Ohlsson Claes, Leung Ping-Chung
Department of Statistics, The Chinese University of Hong Kong, Shatin, New Territories, Hong Kong.
Biom J. 2010 Jun;52(3):314-32. doi: 10.1002/bimj.200900135.
In the development of structural equation models (SEMs), observed variables are usually assumed to be normally distributed. However, this assumption is likely to be violated in many practical researches. As the non-normality of observed variables in an SEM can be obtained from either non-normal latent variables or non-normal residuals or both, semiparametric modeling with unknown distribution of latent variables or unknown distribution of residuals is needed. In this article, we find that an SEM becomes nonidentifiable when both the latent variable distribution and the residual distribution are unknown. Hence, it is impossible to estimate reliably both the latent variable distribution and the residual distribution without parametric assumptions on one or the other. We also find that the residuals in the measurement equation are more sensitive to the normality assumption than the latent variables, and the negative impact on the estimation of parameters and distributions due to the non-normality of residuals is more serious. Therefore, when there is no prior knowledge about parametric distributions for either the latent variables or the residuals, we recommend making parametric assumption on latent variables, and modeling residuals nonparametrically. We propose a semiparametric Bayesian approach using the truncated Dirichlet process with a stick breaking prior to tackle the non-normality of residuals in the measurement equation. Simulation studies and a real data analysis demonstrate our findings, and reveal the empirical performance of the proposed methodology. A free WinBUGS code to perform the analysis is available in Supporting Information.
在结构方程模型(SEM)的发展过程中,通常假定观测变量呈正态分布。然而,在许多实际研究中这一假定可能会被违背。由于SEM中观测变量的非正态性可能源于非正态的潜在变量、非正态的残差或者两者皆有,因此需要对潜在变量分布未知或残差分布未知的情况进行半参数建模。在本文中,我们发现当潜在变量分布和残差分布均未知时,SEM会变得不可识别。因此,在不对其中一方进行参数假设的情况下,可靠地估计潜在变量分布和残差分布是不可能的。我们还发现,测量方程中的残差比潜在变量对正态性假设更为敏感,并且残差的非正态性对参数和分布估计的负面影响更为严重。因此,当对潜在变量或残差的参数分布没有先验知识时,我们建议对潜在变量进行参数假设,并对残差进行非参数建模。我们提出一种半参数贝叶斯方法,使用具有折断先验的截断狄利克雷过程来处理测量方程中残差的非正态性。模拟研究和实际数据分析证明了我们的发现,并揭示了所提出方法的实证性能。支持信息中提供了用于执行分析的免费WinBUGS代码。