Department of Biostatistics and Center for Statistical Genetics, University of Michigan School of Public Health, Ann Arbor, Michigan.
Department of Epidemiology Research, Statens Serum Institut, Copenhagen, Denmark.
Genet Epidemiol. 2019 Jul;43(5):462-476. doi: 10.1002/gepi.22197. Epub 2019 Feb 22.
With the availability of large-scale biobanks, genome-wide scale phenome-wide association studies are being instrumental in discovering novel genetic variants associated with clinical phenotypes. As increasing number of such association results from different biobanks become available, methods to meta-analyse those association results is of great interest. Because the binary phenotypes in biobank-based studies are mostly unbalanced in their case-control ratios, very few methods can provide well-calibrated tests for associations. For example, traditional Z-score-based meta-analysis often results in conservative or anticonservative Type I error rates in such unbalanced scenarios. We propose two meta-analysis strategies that can efficiently combine association results from biobank-based studies with such unbalanced phenotypes, using the saddlepoint approximation-based score test method. Our first method involves sharing the overall genotype counts from each study, and the second method involves sharing an approximation of the distribution of the score test statistic from each study using cubic Hermite splines. We compare our proposed methods with a traditional Z-score-based meta-analysis strategy using numerical simulations and real data applications, and demonstrate the superior performance of our proposed methods in terms of Type I error control.
随着大型生物库的出现,全基因组范围的表型全基因组关联研究正在成为发现与临床表型相关的新型遗传变异的有力工具。随着越来越多的来自不同生物库的此类关联结果可用,对这些关联结果进行荟萃分析的方法引起了极大的兴趣。由于生物库研究中的二元表型在病例对照比中大多不平衡,很少有方法可以为关联提供良好校准的检验。例如,传统的基于 Z 分数的荟萃分析在这种不平衡的情况下通常会导致保守或反保守的 I 型错误率。我们提出了两种荟萃分析策略,它们可以使用基于鞍点逼近的评分检验方法有效地结合来自基于生物库的研究的关联结果,这些策略适用于具有不平衡表型的情况。我们的第一种方法涉及共享每个研究的总体基因型计数,第二种方法涉及使用三次 Hermite 样条共享每个研究的评分检验统计量分布的近似值。我们使用数值模拟和真实数据应用比较了我们提出的方法与传统的基于 Z 分数的荟萃分析策略,并证明了我们提出的方法在控制 I 型错误方面的优越性能。