Li Yan, Lei Haochen, Wen Xiaoquan, Cao Hongyuan
School of Computer Science and Technology, Changchun University of Science and Technology, Changchun, Jilin 130022, China; School of Mathematics, Jilin University, Changchun, Jilin 130012, China.
Department of Statistics, Florida State University, Tallahassee, FL 32306, USA.
Am J Hum Genet. 2024 May 2;111(5):966-978. doi: 10.1016/j.ajhg.2024.04.004.
Replicability is the cornerstone of modern scientific research. Reliable identifications of genotype-phenotype associations that are significant in multiple genome-wide association studies (GWASs) provide stronger evidence for the findings. Current replicability analysis relies on the independence assumption among single-nucleotide polymorphisms (SNPs) and ignores the linkage disequilibrium (LD) structure. We show that such a strategy may produce either overly liberal or overly conservative results in practice. We develop an efficient method, ReAD, to detect replicable SNPs associated with the phenotype from two GWASs accounting for the LD structure. The local dependence structure of SNPs across two heterogeneous studies is captured by a four-state hidden Markov model (HMM) built on two sequences of p values. By incorporating information from adjacent locations via the HMM, our approach provides more accurate SNP significance rankings. ReAD is scalable, platform independent, and more powerful than existing replicability analysis methods with effective false discovery rate control. Through analysis of datasets from two asthma GWASs and two ulcerative colitis GWASs, we show that ReAD can identify replicable genetic loci that existing methods might otherwise miss.
可重复性是现代科学研究的基石。在多个全基因组关联研究(GWAS)中显著的基因型-表型关联的可靠识别为研究结果提供了更强的证据。当前的可重复性分析依赖于单核苷酸多态性(SNP)之间的独立性假设,而忽略了连锁不平衡(LD)结构。我们表明,这种策略在实践中可能会产生过于宽松或过于保守的结果。我们开发了一种有效的方法ReAD,用于从两个考虑LD结构的GWAS中检测与表型相关的可重复SNP。两个异质研究中SNP的局部依赖结构由基于两个p值序列构建的四态隐马尔可夫模型(HMM)捕获。通过HMM纳入来自相邻位置的信息,我们的方法提供了更准确的SNP显著性排名。ReAD具有可扩展性、平台独立性,并且比现有的可重复性分析方法更强大,能够有效控制错误发现率。通过对两个哮喘GWAS和两个溃疡性结肠炎GWAS的数据集进行分析,我们表明ReAD可以识别现有方法可能遗漏的可重复遗传位点。