Department of Psychiatry, Dalhousie University, Halifax, Nova Scotia, Canada.
PLoS One. 2013 Oct 11;8(10):e76295. doi: 10.1371/journal.pone.0076295. eCollection 2013.
Phenotypic misclassification (between cases) has been shown to reduce the power to detect association in genetic studies. However, it is conceivable that complex traits are heterogeneous with respect to individual genetic susceptibility and disease pathophysiology, and that the effect of heterogeneity has a larger magnitude than the effect of phenotyping errors. Although an intuitively clear concept, the effect of heterogeneity on genetic studies of common diseases has received little attention. Here we investigate the impact of phenotypic and genetic heterogeneity on the statistical power of genome wide association studies (GWAS). We first performed a study of simulated genotypic and phenotypic data. Next, we analyzed the Wellcome Trust Case-Control Consortium (WTCCC) data for diabetes mellitus (DM) type 1 (T1D) and type 2 (T2D), using varying proportions of each type of diabetes in order to examine the impact of heterogeneity on the strength and statistical significance of association previously found in the WTCCC data. In both simulated and real data, heterogeneity (presence of "non-cases") reduced the statistical power to detect genetic association and greatly decreased the estimates of risk attributed to genetic variation. This finding was also supported by the analysis of loci validated in subsequent large-scale meta-analyses. For example, heterogeneity of 50% increases the required sample size by approximately three times. These results suggest that accurate phenotype delineation may be more important for detecting true genetic associations than increase in sample size.
表型误分类(病例间)已被证明会降低遗传研究中关联检测的功效。然而,可以想象的是,复杂性状在个体遗传易感性和疾病病理生理学方面存在异质性,并且异质性的影响比表型错误的影响更大。尽管这是一个直观清晰的概念,但遗传疾病研究中对异质性的影响却很少受到关注。在这里,我们研究了表型和遗传异质性对全基因组关联研究(GWAS)统计功效的影响。我们首先对模拟的基因型和表型数据进行了研究。接下来,我们分析了惠康信托基金会病例对照联合会(WTCCC)的 1 型糖尿病(T1D)和 2 型糖尿病(T2D)数据,使用每种糖尿病的不同比例,以检验异质性对先前在 WTCCC 数据中发现的关联强度和统计显著性的影响。在模拟和真实数据中,异质性(存在“非病例”)降低了检测遗传关联的统计功效,并大大降低了归因于遗传变异的风险估计。这一发现也得到了随后大规模荟萃分析中验证的基因座分析的支持。例如,50%的异质性使所需的样本量增加了大约三倍。这些结果表明,准确的表型描绘可能比增加样本量更重要,以检测真正的遗传关联。