Department of Electronic and Computer Engineering, The Hong Kong University of Science and Technology, Clear Water Bay, Kowloon, Hong Kong, China.
Stat Methods Med Res. 2018 Sep;27(9):2795-2808. doi: 10.1177/0962280216687168. Epub 2017 Jan 8.
In genome-wide association studies, we normally discover associations between genetic variants and diseases/traits in primary studies, and validate the findings in replication studies. We consider the associations identified in both primary and replication studies as true findings. An important question under this two-stage setting is how to determine significance levels in both studies. In traditional methods, significance levels of the primary and replication studies are determined separately. We argue that the separate determination strategy reduces the power in the overall two-stage study. Therefore, we propose a novel method to determine significance levels jointly. Our method is a reanalysis method that needs summary statistics from both studies. We find the most powerful significance levels when controlling the false discovery rate in the two-stage study. To enjoy the power improvement from the joint determination method, we need to select single nucleotide polymorphisms for replication at a less stringent significance level. This is a common practice in studies designed for discovery purpose. We suggest this practice is also suitable in studies with validation purpose in order to identify more true findings. Simulation experiments show that our method can provide more power than traditional methods and that the false discovery rate is well-controlled. Empirical experiments on datasets of five diseases/traits demonstrate that our method can help identify more associations. The R-package is available at: http://bioinformatics.ust.hk/RFdr.html .
在全基因组关联研究中,我们通常在初步研究中发现遗传变异与疾病/特征之间的关联,并在复制研究中验证这些发现。我们将在初步和复制研究中发现的关联视为真实发现。在这种两阶段设置下,一个重要的问题是如何在两项研究中确定显著性水平。在传统方法中,初步和复制研究的显著性水平是分别确定的。我们认为,单独确定策略降低了整体两阶段研究的功效。因此,我们提出了一种联合确定显著性水平的新方法。我们的方法是一种需要从两项研究中汇总统计数据的重新分析方法。我们找到了在控制两阶段研究中的错误发现率时最强大的显著性水平。为了从联合确定方法中获得功效的提高,我们需要选择在较低显著性水平下用于复制的单核苷酸多态性。这是为发现目的而设计的研究中的常见做法。我们建议,这种做法也适用于具有验证目的的研究中,以便识别更多的真实发现。模拟实验表明,我们的方法比传统方法具有更高的功效,并且错误发现率得到了很好的控制。五个疾病/特征数据集的实证实验表明,我们的方法可以帮助识别更多的关联。R 包可在:http://bioinformatics.ust.hk/RFdr.html 获得。