Jin Qinqin, Shi Gang
State Key Laboratory of Integrated Services Networks, Xidian University, Xi'an, China.
Applied Science College, Taiyuan University of Science and Technology, Taiyuan, China.
Front Genet. 2020 Jan 30;10:1400. doi: 10.3389/fgene.2019.01400. eCollection 2019.
Meta-analysis, which combines the results of multiple studies, is an important analytical method in genome-wide association studies. In genome-wide association studies practice, studies employing meta-analysis may have overlapping data, which could yield false positive results. Recent studies have proposed models to handle the issue of overlapping data when testing the genetic main effect of single nucleotide polymorphism. However, there is still no meta-analysis method for testing gene-environment interaction when overlapping data exist. Inspired by the methods of testing the main effect of gene with overlapping data, we proposed an overlapping meta-regulation method to address the issue in testing the gene-environment interaction. We generalized the covariance matrices of the regular meta-regression model by employing Lin's and Han's correlation structures to incorporate the correlations introduced by the overlapping data. Based on our proposed models, we further provided statistical significance tests of the gene-environment interaction as well as joint effects of the gene main effect and the interaction. Through simulations, we examined type I errors and statistical powers of our proposed methods at different levels of data overlap among studies. We demonstrated that our method well controls the type I error and simultaneously achieves statistical power comparable with the method that removes overlapping samples before the meta-analysis, i.e., the splitting method. On the other hand, ignoring overlapping data will inflate the type I error. Unlike the splitting method that requires individual-level genotype and phenotype data, our proposed method for testing gene-environment interaction handles the issue of overlapping data effectively and statistically efficiently at the meta-analysis level.
荟萃分析结合了多项研究的结果,是全基因组关联研究中的一种重要分析方法。在全基因组关联研究实践中,采用荟萃分析的研究可能存在重叠数据,这可能会产生假阳性结果。最近的研究提出了一些模型来处理在检测单核苷酸多态性的基因主效应时的重叠数据问题。然而,当存在重叠数据时,仍然没有用于检测基因-环境相互作用的荟萃分析方法。受处理重叠数据检测基因主效应方法的启发,我们提出了一种重叠荟萃调节方法来解决检测基因-环境相互作用中的问题。我们通过采用林氏和韩氏相关结构对常规荟萃回归模型的协方差矩阵进行推广,以纳入重叠数据引入的相关性。基于我们提出的模型,我们进一步提供了基因-环境相互作用以及基因主效应和相互作用联合效应的统计显著性检验。通过模拟,我们在研究间不同程度的数据重叠情况下检验了我们提出方法的I型错误和统计功效。我们证明,我们的方法能够很好地控制I型错误,同时实现与在荟萃分析前去除重叠样本的方法(即拆分法)相当的统计功效。另一方面,忽略重叠数据会使I型错误膨胀。与需要个体水平基因型和表型数据的拆分法不同,我们提出的检测基因-环境相互作用的方法在荟萃分析层面有效且统计高效地处理了重叠数据问题。