Paye Sanjana M, Edge Michael D
Department of Quantitative and Computational Biology, University of Southern California.
bioRxiv. 2024 Dec 17:2024.12.17.628943. doi: 10.1101/2024.12.17.628943.
Case-control genome-wide association studies (GWAS) are often used to find associations between genetic variants and diseases. When case-control GWAS are conducted, researchers must make decisions regarding how many cases and how many controls to include in the study. Depending on differing availability and cost of controls and cases, varying case fractions are used in case-control GWAS. Connections between variants and diseases are made using association statistics, including . Previous work in population genetics has shown that LD statistics, including , are bounded by the allele frequencies in the population being studied. Since varying the case fraction changes sample allele frequencies, we extend use the known bounds on to explore how variation in the fraction of cases included in a study can impact statistical power to detect associations. We analyze a simple mathematical model and use simulations to study a quantity proportional to the noncentrality parameter, which is closely related to , under various conditions. Varying the case fraction changes the noncentrality parameter, and by extension the statistical power, with effects depending on the dominance, penetrance, and frequency of the risk allele. Our framework explains previously observed results, such as asymmetries in power to detect risk vs. protective alleles, and the fact that a balanced sample of cases and controls does not always give the best power to detect associations, particularly for highly penetrant minor risk alleles that are either dominant or recessive. We show by simulation that our results can be used as a rough guide to statistical power for association tests other than tests of independence.
病例对照全基因组关联研究(GWAS)常用于寻找基因变异与疾病之间的关联。在进行病例对照GWAS时,研究人员必须决定纳入研究的病例和对照的数量。根据对照和病例的可得性及成本差异,病例对照GWAS中使用了不同的病例比例。使用包括……在内的关联统计方法来建立变异与疾病之间的联系。群体遗传学的先前研究表明,包括……在内的连锁不平衡(LD)统计量受所研究群体中等位基因频率的限制。由于改变病例比例会改变样本等位基因频率,我们扩展使用已知的……界限来探讨研究中纳入的病例比例变化如何影响检测关联的统计效力。我们分析一个简单的数学模型,并通过模拟研究在各种条件下与……非中心参数成比例的一个量,该量与……密切相关。改变病例比例会改变……非中心参数,进而改变统计效力,其影响取决于风险等位基因的显性、外显率和频率。我们的框架解释了先前观察到的结果,例如检测风险等位基因与保护性等位基因的效力不对称,以及病例和对照的平衡样本并不总是能提供检测关联的最佳效力这一事实,特别是对于显性或隐性的高外显率小风险等位基因。我们通过模拟表明,我们的结果可作为除独立性检验之外的关联检验统计效力的粗略指南。