Suppr超能文献

一种在全基因组关联研究中识别可重复变异的有效方法。

A powerful approach to identify replicable variants in genome-wide association studies.

作者信息

Li Yan, Lei Haochen, Wen Xiaoquan, Cao Hongyuan

机构信息

School of Computer Science and Technology, Changchun University of Science and Technology, Changchun, Jilin 130022, China; School of Mathematics, Jilin University, Changchun, Jilin 130012, China.

Department of Statistics, Florida State University, Tallahassee, FL 32306, USA.

出版信息

Am J Hum Genet. 2024 May 2;111(5):966-978. doi: 10.1016/j.ajhg.2024.04.004.

Abstract

Replicability is the cornerstone of modern scientific research. Reliable identifications of genotype-phenotype associations that are significant in multiple genome-wide association studies (GWASs) provide stronger evidence for the findings. Current replicability analysis relies on the independence assumption among single-nucleotide polymorphisms (SNPs) and ignores the linkage disequilibrium (LD) structure. We show that such a strategy may produce either overly liberal or overly conservative results in practice. We develop an efficient method, ReAD, to detect replicable SNPs associated with the phenotype from two GWASs accounting for the LD structure. The local dependence structure of SNPs across two heterogeneous studies is captured by a four-state hidden Markov model (HMM) built on two sequences of p values. By incorporating information from adjacent locations via the HMM, our approach provides more accurate SNP significance rankings. ReAD is scalable, platform independent, and more powerful than existing replicability analysis methods with effective false discovery rate control. Through analysis of datasets from two asthma GWASs and two ulcerative colitis GWASs, we show that ReAD can identify replicable genetic loci that existing methods might otherwise miss.

摘要

可重复性是现代科学研究的基石。在多个全基因组关联研究(GWAS)中显著的基因型-表型关联的可靠识别为研究结果提供了更强的证据。当前的可重复性分析依赖于单核苷酸多态性(SNP)之间的独立性假设,而忽略了连锁不平衡(LD)结构。我们表明,这种策略在实践中可能会产生过于宽松或过于保守的结果。我们开发了一种有效的方法ReAD,用于从两个考虑LD结构的GWAS中检测与表型相关的可重复SNP。两个异质研究中SNP的局部依赖结构由基于两个p值序列构建的四态隐马尔可夫模型(HMM)捕获。通过HMM纳入来自相邻位置的信息,我们的方法提供了更准确的SNP显著性排名。ReAD具有可扩展性、平台独立性,并且比现有的可重复性分析方法更强大,能够有效控制错误发现率。通过对两个哮喘GWAS和两个溃疡性结肠炎GWAS的数据集进行分析,我们表明ReAD可以识别现有方法可能遗漏的可重复遗传位点。

相似文献

7
Exploiting genome structure in association analysis.在关联分析中利用基因组结构
J Comput Biol. 2014 Apr;21(4):345-60. doi: 10.1089/cmb.2009.0224. Epub 2011 May 6.

本文引用的文献

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验