Department of Psychology, Northumbria University, Newcastle upon Tyne, United Kingdom.
PLoS One. 2022 Feb 3;17(2):e0262809. doi: 10.1371/journal.pone.0262809. eCollection 2022.
Meta-analyses typically quantify heterogeneity of results, thus providing information about the consistency of the investigated effect across studies. Numerous heterogeneity estimators have been devised. Past evaluations of their performance typically presumed lack of bias in the set of studies being meta-analysed, which is often unrealistic. The present study used computer simulations to evaluate five heterogeneity estimators under a range of research conditions broadly representative of meta-analyses in psychology, with the aim to assess the impact of biases in sets of primary studies on estimates of both mean effect size and heterogeneity in meta-analyses of continuous outcome measures. To this end, six orthogonal design factors were manipulated: Strength of publication bias; 1-tailed vs. 2-tailed publication bias; prevalence of p-hacking; true heterogeneity of the effect studied; true average size of the studied effect; and number of studies per meta-analysis. Our results showed that biases in sets of primary studies caused much greater problems for the estimation of effect size than for the estimation of heterogeneity. For the latter, estimation bias remained small or moderate under most circumstances. Effect size estimations remained virtually unaffected by the choice of heterogeneity estimator. For heterogeneity estimates, however, relevant differences emerged. For unbiased primary studies, the REML estimator and (to a lesser extent) the Paule-Mandel performed well in terms of bias and variance. In biased sets of primary studies however, the Paule-Mandel estimator performed poorly, whereas the DerSimonian-Laird estimator and (to a slightly lesser extent) the REML estimator performed well. The complexity of results notwithstanding, we suggest that the REML estimator remains a good choice for meta-analyses of continuous outcome measures across varied circumstances.
荟萃分析通常会量化结果的异质性,从而提供关于研究间调查效应一致性的信息。已经设计了许多异质性估计量。过去对其性能的评估通常假设荟萃分析中所包含的研究没有偏倚,这通常是不现实的。本研究使用计算机模拟在广泛代表心理学荟萃分析的一系列研究条件下评估了五种异质性估计量,目的是评估对主要研究集的偏倚对连续结果测量荟萃分析中平均效应大小和异质性估计的影响。为此,我们操纵了六个正交设计因素:发表偏倚的强度;单侧与双侧发表偏倚;p 值操纵的普遍性;所研究效应的真实异质性;所研究效应的真实平均大小;以及每项荟萃分析的研究数量。我们的结果表明,主要研究集的偏倚对效应大小的估计造成了比异质性的估计更大的问题。对于后者,在大多数情况下,估计偏倚仍然较小或中等。效应大小估计几乎不受异质性估计量选择的影响。然而,对于异质性估计,出现了相关的差异。对于无偏的主要研究,REML 估计量和(在较小程度上)Paule-Mandel 估计量在偏差和方差方面表现良好。然而,在有偏的主要研究集中,Paule-Mandel 估计量表现不佳,而 DerSimonian-Laird 估计量和(在较小程度上)REML 估计量表现良好。尽管结果复杂,但我们建议 REML 估计量仍然是各种情况下连续结果测量荟萃分析的一个不错的选择。