Am J Epidemiol. 2016 Nov 15;184(10):770-778. doi: 10.1093/aje/kww098.
Standardization procedures are commonly used to combine phenotype data that were measured using different instruments, but there is little information on how the choice of standardization method influences pooled estimates and heterogeneity. Heterogeneity is of key importance in meta-analyses of observational studies because it affects the statistical models used and the decision of whether or not it is appropriate to calculate a pooled estimate of effect. Using 2-stage individual participant data analyses, we compared 2 common methods of standardization, T-scores and category-centered scores, to create combinable memory scores using cross-sectional data from 3 Canadian population-based studies (the Canadian Study on Health and Aging (1991-1992), the Canadian Community Health Survey on Healthy Aging (2008-2009), and the Quebec Longitudinal Study on Nutrition and Aging (2004-2005)). A simulation was then conducted to assess the influence of varying the following items across population-based studies: 1) effect size, 2) distribution of confounders, and 3) the relationship between confounders and the outcome. We found that pooled estimates based on the unadjusted category-centered scores tended to be larger than those based on the T-scores, although the differences were negligible when adjusted scores were used, and that most individual participant data meta-analyses identified significant heterogeneity. The results of the simulation suggested that in terms of heterogeneity, the method of standardization played a smaller role than did different effect sizes across populations and differential confounding of the outcome measure across studies. Although there was general consistency between the 2 types of standardization methods, the simulations identified a number of sources of heterogeneity, some of which are not the usual sources considered by researchers.
标准化程序通常用于合并使用不同仪器测量的表型数据,但关于标准化方法的选择如何影响汇总估计值和异质性的信息却很少。异质性在观察性研究的荟萃分析中至关重要,因为它会影响所使用的统计模型以及是否适合计算汇总效应估计值的决策。我们使用两阶段个体参与者数据分析,比较了两种常见的标准化方法,即 T 分数和类别中心分数,以使用来自 3 项加拿大基于人群的研究(1991-1992 年加拿大健康老龄化研究、2008-2009 年加拿大社区健康老龄化调查和 2004-2005 年魁北克营养与老龄化纵向研究)的横断面数据创建可组合的记忆分数。然后进行了模拟,以评估在以下方面跨基于人群的研究改变的影响:1)效应大小,2)混杂因素的分布,以及 3)混杂因素与结果之间的关系。我们发现,基于未调整的类别中心分数的汇总估计值往往大于基于 T 分数的估计值,尽管使用调整后的分数时差异可以忽略不计,并且大多数个体参与者数据荟萃分析确定存在显著的异质性。模拟结果表明,就异质性而言,标准化方法的作用不如不同人群中的效应大小以及不同研究中对结果测量的混杂作用大。虽然这两种标准化方法之间存在总体一致性,但模拟确定了一些异质性来源,其中一些不是研究人员通常考虑的来源。