Sofer Tamar, Heller Ruth, Bogomolov Marina, Avery Christy L, Graff Mariaelisa, North Kari E, Reiner Alex P, Thornton Timothy A, Rice Kenneth, Benjamini Yoav, Laurie Cathy C, Kerr Kathleen F
Department of Biostatistics, University of Washington, Seattle, WA, USA.
Department of Statistics and Operations Research, Tel-Aviv University, Tel-Aviv, Israel.
Genet Epidemiol. 2017 Apr;41(3):251-258. doi: 10.1002/gepi.22029. Epub 2017 Jan 15.
In genome-wide association studies (GWAS), "generalization" is the replication of genotype-phenotype association in a population with different ancestry than the population in which it was first identified. Current practices for declaring generalizations rely on testing associations while controlling the family-wise error rate (FWER) in the discovery study, then separately controlling error measures in the follow-up study. This approach does not guarantee control over the FWER or false discovery rate (FDR) of the generalization null hypotheses. It also fails to leverage the two-stage design to increase power for detecting generalized associations. We provide a formal statistical framework for quantifying the evidence of generalization that accounts for the (in)consistency between the directions of associations in the discovery and follow-up studies. We develop the directional generalization FWER (FWER ) and FDR (FDR ) controlling r-values, which are used to declare associations as generalized. This framework extends to generalization testing when applied to a published list of Single Nucleotide Polymorphism-(SNP)-trait associations. Our methods control FWER or FDR under various SNP selection rules based on P-values in the discovery study. We find that it is often beneficial to use a more lenient P-value threshold than the genome-wide significance threshold. In a GWAS of total cholesterol in the Hispanic Community Health Study/Study of Latinos (HCHS/SOL), when testing all SNPs with P-values <5×10-8 (15 genomic regions) for generalization in a large GWAS of whites, we generalized SNPs from 15 regions. But when testing all SNPs with P-values <6.6×10-5 (89 regions), we generalized SNPs from 27 regions.
在全基因组关联研究(GWAS)中,“泛化”是指在与首次发现基因型-表型关联的人群具有不同祖先的人群中对该关联进行复制。目前宣布泛化的做法依赖于在发现研究中控制家族性错误率(FWER)的同时测试关联,然后在后续研究中分别控制错误度量。这种方法不能保证对泛化无效假设的FWER或错误发现率(FDR)进行控制。它也未能利用两阶段设计来提高检测泛化关联的功效。我们提供了一个正式的统计框架,用于量化泛化的证据,该框架考虑了发现研究和后续研究中关联方向之间的(不)一致性。我们开发了用于控制r值的方向泛化FWER(FWER )和FDR(FDR ),这些r值用于将关联声明为泛化。当应用于已发表的单核苷酸多态性-(SNP)-性状关联列表时,该框架扩展到泛化测试。我们的方法在基于发现研究中的P值的各种SNP选择规则下控制FWER或FDR。我们发现,使用比全基因组显著性阈值更宽松的P值阈值通常是有益的。在西班牙裔社区健康研究/拉丁裔研究(HCHS/SOL)中对总胆固醇进行的GWAS中,当在一项针对白人进行的大型GWAS中测试所有P值<5×10-8(15个基因组区域)的SNP的泛化情况时,我们从15个区域泛化了SNP。但是,当测试所有P值<6.6×10-5(89个区域)的SNP时,我们从27个区域泛化了SNP。