Viechtbauer Wolfgang
University of Illinois at Urbana-Champaign, USA and University of Maastricht, The Netherlands.
Br J Math Stat Psychol. 2007 May;60(Pt 1):29-60. doi: 10.1348/000711005X64042.
Choice of the appropriate model in meta-analysis is often treated as an empirical question which is answered by examining the amount of variability in the effect sizes. When all of the observed variability in the effect sizes can be accounted for based on sampling error alone, a set of effect sizes is said to be homogeneous and a fixed-effects model is typically adopted. Whether a set of effect sizes is homogeneous or not is usually tested with the so-called Q test. In this paper, a variety of alternative homogeneity tests - the likelihood ratio, Wald and score tests - are compared with the Q test in terms of their Type I error rate and power for four different effect size measures. Monte Carlo simulations show that the Q test kept the tightest control of the Type I error rate, although the results emphasize the importance of large sample sizes within the set of studies. The results also suggest under what conditions the power of the tests can be considered adequate.
在荟萃分析中,选择合适的模型通常被视为一个实证问题,通过检查效应量的变异性来回答。当效应量中所有观察到的变异性仅可基于抽样误差来解释时,一组效应量被认为是同质的,通常采用固定效应模型。一组效应量是否同质通常用所谓的Q检验来检验。在本文中,将各种替代的同质性检验——似然比检验、Wald检验和得分检验——与Q检验在四种不同效应量测量方法的I型错误率和检验效能方面进行了比较。蒙特卡罗模拟表明,Q检验对I型错误率的控制最为严格,尽管结果强调了研究集中大样本量的重要性。结果还表明了在何种条件下可以认为检验效能是足够的。