Benstock Sarah E, Weaver Katherine, Hettema John, Verhulst Brad
Res Sq. 2024 Jan 31:rs.3.rs-3858178. doi: 10.21203/rs.3.rs-3858178/v1.
Genome-wide association studies (GWAS) are underpowered due to small effect sizes of single nucleotide polymorphisms (SNPs) on phenotypes and extreme multiple testing thresholds. The most common approach for increasing statistical power is to increase sample size. We propose an alternative strategy of redefining case-control outcomes into ordinal case-subthreshold-asymptomatic variables. While maintaining the clinical case threshold, we subdivide controls into two groups: individuals who are symptomatic but do not meet the clinical criteria for diagnosis (subthreshold) and individuals who are effectively asymptomatic. We conducted a simulation study to examine the impact of effect size, minor allele frequency, population prevalence, and the prevalence of the subthreshold group on statistical power to detect genetic associations in three scenarios: a standard case-control, an ordinal, and a case-asymptomatic control analysis. Our results suggest the ordinal model consistently provides the most statistical power while the case-control model the least. Power in the case-asymptomatic control model reflects the case-control or ordinal model depending on the population prevalence and size of the subthreshold category. We then analyzed a major depression phenotype from the UK Biobank to corroborate our simulation results. Overall, the ordinal model improves statistical power in GWAS consistent with increasing the sample size by approximately 10%.
全基因组关联研究(GWAS)由于单核苷酸多态性(SNP)对表型的效应量较小以及极端的多重检验阈值而效能不足。提高统计效能最常见的方法是增加样本量。我们提出了一种替代策略,即将病例对照结局重新定义为有序的病例亚阈值无症状变量。在维持临床病例阈值的同时,我们将对照分为两组:有症状但不符合临床诊断标准的个体(亚阈值)和实际上无症状的个体。我们进行了一项模拟研究,以检验效应量、次要等位基因频率、人群患病率以及亚阈值组患病率对三种情况下检测基因关联的统计效能的影响:标准病例对照、有序和病例无症状对照分析。我们的结果表明,有序模型始终提供最高的统计效能,而病例对照模型的统计效能最低。病例无症状对照模型的效能取决于人群患病率和亚阈值类别的大小,反映出病例对照模型或有序模型的情况。然后,我们分析了英国生物银行的一种重度抑郁表型,以证实我们的模拟结果。总体而言,有序模型提高了GWAS的统计效能,相当于样本量增加了约10%。