Cambridge Baker Systems Genomics Initiative, Baker Heart and Diabetes Institute, 75 Commercial Rd, Melbourne 3004, Victoria, Australia.
Department of Clinical Pathology, University of Melbourne, Parkville 3010, Victoria, Australia.
Nucleic Acids Res. 2018 Dec 14;46(22):e133. doi: 10.1093/nar/gky780.
Investigation of the genetic architecture of gene expression traits has aided interpretation of disease and trait-associated genetic variants; however, key aspects of expression quantitative trait loci (eQTL) study design and analysis remain understudied. We used extensive, empirically driven simulations to explore eQTL study design and the performance of various analysis strategies. Across multiple testing correction methods, false discoveries of genes with eQTLs (eGenes) were substantially inflated when false discovery rate (FDR) control was applied to all tests and only appropriately controlled using hierarchical procedures. All multiple testing correction procedures had low power and inflated FDR for eGenes whose causal SNPs had small allele frequencies using small sample sizes (e.g. frequency <10% in 100 samples), indicating that even moderately low frequency eQTL SNPs (eSNPs) in these studies are enriched for false discoveries. In scenarios with ≥80% power, the top eSNP was the true simulated eSNP 90% of the time, but substantially less frequently for very common eSNPs (minor allele frequencies >25%). Overestimation of eQTL effect sizes, so-called 'Winner's Curse', was common in low and moderate power settings. To address this, we developed a bootstrap method (BootstrapQTL) that led to more accurate effect size estimation. These insights provide a foundation for future eQTL studies, especially those with sampling constraints and subtly different conditions.
基因表达性状的遗传结构研究有助于解释疾病和性状相关的遗传变异;然而,表达数量性状基因座 (eQTL) 研究设计和分析的关键方面仍未得到充分研究。我们使用广泛的、经验驱动的模拟来探索 eQTL 研究设计和各种分析策略的性能。在多种检验校正方法中,当对所有检验应用错误发现率 (FDR) 控制且仅使用分层程序进行适当控制时,eQTL 基因(eGenes)的假发现数量显著膨胀。所有多重检验校正程序在使用小样本量(例如,在 100 个样本中频率 <10%)时,对因果 SNP 具有小等位基因频率的 eGenes 的功效和 FDR 都很低,这表明即使是在这些研究中中等低频率的 eQTL SNP (eSNPs) 也存在大量的假发现。在≥80%功效的情况下,90%的时间,eSNP 的 top SNP 都是真实模拟的 eSNP,但对于非常常见的 eSNPs(次要等位基因频率 >25%)则不太常见。eQTL 效应大小的高估,即所谓的“赢家诅咒”,在低功效和中功效设置中很常见。为了解决这个问题,我们开发了一种自举方法 (BootstrapQTL),该方法可以更准确地估计效应大小。这些见解为未来的 eQTL 研究提供了基础,特别是那些具有采样限制和略有不同条件的研究。