Risk Analysis Research Center, The Institute of Statistical Mathematics, Tachikawa, Tokyo, 190-8562, Japan.
Department of Data Science, The Institute of Statistical Mathematics, Tachikawa, Tokyo, 190-8562, Japan.
Eur J Hum Genet. 2018 Jul;26(7):1038-1048. doi: 10.1038/s41431-018-0125-3. Epub 2018 Mar 9.
Although enormous costs have been dedicated to discovering relevant disease-related genetic variants, especially in genome-wide association studies (GWASs), only a small fraction of estimated heritability can be explained by these results. This is the so-called missing heritability problem. The conventional use of overly conservative multiple testing strategies based on controlling the familywise error rate (FWER), in particular with a genome-wide significance threshold of P <5 × 10, is one of the most important issues from a statistical perspective. To help resolve this problem, we performed comprehensive re-assessments of currently available strategies using recently published, extremely large-scale GWAS data sets of rheumatoid arthritis and schizophrenia (>50,000 subjects). The estimates of statistical power averaged for all disease-related genetic variants of the standard FWER-based strategy were only 0.09% for the rheumatoid arthritis data and 0.04% for the schizophrenia data. To design more efficient strategies, we also conducted an extensive comparison of multiple testing strategies by applying false discovery rate (FDR)-controlling procedures to these data sets and simulations, and found that the FDR-based procedures achieved higher power than the FWER-based strategy, even at a strict FDR level (e.g., FDR = 1%). We also discuss a useful alternative measure, namely "partial power," which is an averaged power for detecting the clinically and biologically meaningful genetic factors with the largest effects. Simulation results suggest that the FDR-based procedures can achieve sufficient partial power (>80%) for detecting these factors (odds ratios of >1.05) with 80,000 subjects, and thus this may be a useful measure for defining realistic objectives of future GWASs.
尽管已经投入了大量成本来发现相关的疾病相关遗传变异,尤其是在全基因组关联研究(GWAS)中,但这些结果只能解释估计遗传率的一小部分。这就是所谓的遗传缺失问题。从统计学角度来看,传统上使用基于控制总体错误率(FWER)的过度保守的多重检验策略,尤其是具有 5×10-8 的全基因组显着性阈值,是最重要的问题之一。为了帮助解决这个问题,我们使用最近发表的、规模极大的类风湿关节炎和精神分裂症 GWAS 数据集(>50,000 个样本),对目前可用的策略进行了全面重新评估。基于标准 FWER 的策略针对所有疾病相关遗传变异的统计功效估计值,对于类风湿关节炎数据仅为 0.09%,对于精神分裂症数据仅为 0.04%。为了设计更有效的策略,我们还通过将 FDR 控制程序应用于这些数据集和模拟,对多种多重检验策略进行了广泛比较,并发现 FDR 控制程序比基于 FWER 的策略具有更高的功效,即使在严格的 FDR 水平(例如,FDR=1%)也是如此。我们还讨论了一种有用的替代度量标准,即“部分功效”,这是检测具有最大影响的临床和生物学上有意义的遗传因素的平均功效。模拟结果表明,基于 FDR 的程序可以在 80,000 个样本中实现足够的部分功效(>80%),以检测这些因素(优势比>1.05),因此这可能是定义未来 GWAS 实际目标的有用度量标准。