Department of Health Research and Policy, Stanford University, Stanford, California, USA.
Genet Epidemiol. 2011 Nov;35(7):581-91. doi: 10.1002/gepi.20603. Epub 2011 Sep 15.
Meta-analysis of genome-wide association studies involves testing single nucleotide polymorphisms (SNPs) using summary statistics that are weighted sums of site-specific score or Wald statistics. This approach avoids having to pool individual-level data. We describe the weights that maximize the power of the summary statistics. For small effect-sizes, any choice of weights yields summary Wald and score statistics with the same power, and the optimal weights are proportional to the square roots of the sites' Fisher information for the SNP's regression coefficient. When SNP effect size is constant across sites, the optimal summary Wald statistic is the well-known inverse-variance-weighted combination of estimated regression coefficients, divided by its standard deviation. We give simple approximations to the optimal weights for various phenotypes, and show that weights proportional to the square roots of study sizes are suboptimal for data from case-control studies with varying case-control ratios, for quantitative trait data when the trait variance differs across sites, for count data when the site-specific mean counts differ, and for survival data with different proportions of failing subjects. Simulations suggest that weights that accommodate intersite variation in imputation error give little power gain compared to those obtained ignoring imputation uncertainties. We note advantages to combining site-specific score statistics, and we show how they can be used to assess effect-size heterogeneity across sites. The utility of the summary score statistic is illustrated by application to a meta-analysis of schizophrenia data in which only site-specific P-values and directions of association are available.
全基因组关联研究的荟萃分析涉及使用汇总统计数据测试单核苷酸多态性(SNP),这些汇总统计数据是基于特定位置的得分或 Wald 统计量的加权和。这种方法避免了需要合并个体水平的数据。我们描述了最大化汇总统计数据功效的权重。对于小的效应大小,任何权重的选择都会产生具有相同功效的汇总 Wald 和得分统计量,并且最优权重与 SNP 回归系数的 Fisher 信息的平方根成正比。当 SNP 效应大小在各个位置上保持不变时,最优的汇总 Wald 统计量是众所周知的基于逆方差加权的估计回归系数的组合,除以其标准差。我们为各种表型提供了最优权重的简单近似,并表明当病例对照比在病例对照研究中变化时,当性状方差在各个位置上不同时,对于定量性状数据,当特定位置的平均计数不同时,对于计数数据,当失败受试者的比例不同时,与研究大小的平方根成正比的权重是次优的。模拟表明,与忽略插补不确定性相比,适应插补误差的位置间变化的权重几乎没有增益。我们注意到结合特定位置的得分统计数据的优势,并展示了如何使用它们来评估各个位置之间的效应大小异质性。在应用于仅提供特定位置的 P 值和关联方向的精神分裂症数据的荟萃分析中,说明了汇总得分统计量的效用。