全基因组关联研究中通路富集检测的策略与问题

Strategies and issues in the detection of pathway enrichment in genome-wide association studies.

作者信息

Hong Mun-Gwan, Pawitan Yudi, Magnusson Patrik K E, Prince Jonathan A

机构信息

Department of Medical Epidemiology and Biostatistics, Karolinska Institutet, Stockholm, Sweden.

出版信息

Hum Genet. 2009 Aug;126(2):289-301. doi: 10.1007/s00439-009-0676-z. Epub 2009 May 1.

DOI:10.1007/s00439-009-0676-z

PMID:19408013

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC2865249/

Abstract

A fundamental question in human genetics is the degree to which the polygenic character of complex traits derives from polymorphism in genes with similar or with dissimilar functions. The many genome-wide association studies now being performed offer an opportunity to investigate this, and although early attempts are emerging, new tools and modeling strategies still need to be developed and deployed. Towards this goal, we implemented a new algorithm to facilitate the transition from genetic marker lists (principally those generated by PLINK) to pathway analyses of representational gene sets in either threshold or threshold-free downstream applications (e.g. DAVID, GSEA-P, and Ingenuity Pathway Analysis). This was applied to several large genome-wide association studies covering diverse human traits that included type 2 diabetes, Crohn's disease, and plasma lipid levels. Validation of this approach was obtained for plasma HDL levels, where functional categories related to lipid metabolism emerged as the most significant in two independent studies. From analyses of these samples, we highlight and address numerous issues related to this strategy, including appropriate gene based correction statistics, the utility of imputed versus non-imputed marker sets, and the apparent enrichment of pathways due solely to the positional clustering of functionally related genes. The latter in particular emphasizes the importance of studies that directly tie genetic variation to functional characteristics of specific genes. The software freely provided that we have called ProxyGeneLD may resolve an important bottleneck in pathway-based analyses of genome-wide association data. This has allowed us to identify at least one replicable case of pathway enrichment but also to highlight functional gene clustering as a potentially serious problem that may lead to spurious pathway findings if not corrected.

摘要

人类遗传学中的一个基本问题是，复杂性状的多基因特性在多大程度上源自功能相似或不同的基因中的多态性。目前正在进行的众多全基因组关联研究为探究这一问题提供了契机，尽管已有早期尝试出现，但仍需开发和应用新的工具及建模策略。为实现这一目标，我们实施了一种新算法，以促进从遗传标记列表（主要由PLINK生成）到下游阈值或无阈值应用（如DAVID、GSEA-P和Ingenuity Pathway Analysis）中代表性基因集的通路分析的转变。该算法应用于多项涵盖不同人类性状的大型全基因组关联研究，这些性状包括2型糖尿病、克罗恩病和血浆脂质水平。在血浆高密度脂蛋白水平方面获得了该方法的验证，在两项独立研究中，与脂质代谢相关的功能类别最为显著。通过对这些样本的分析，我们强调并解决了与该策略相关的众多问题，包括基于基因的适当校正统计、估算与非估算标记集的效用，以及仅由于功能相关基因的位置聚类而导致的通路明显富集。后者尤其强调了直接将遗传变异与特定基因的功能特征联系起来的研究的重要性。我们免费提供的名为ProxyGeneLD的软件可能会解决全基因组关联数据基于通路分析中的一个重要瓶颈。这使我们能够识别至少一个可重复的通路富集案例，同时也突出了功能基因聚类作为一个潜在的严重问题，如果不加以纠正，可能会导致虚假的通路发现。

相似文献

Strategies and issues in the detection of pathway enrichment in genome-wide association studies.

Hum Genet. 2009 Aug;126(2):289-301. doi: 10.1007/s00439-009-0676-z. Epub 2009 May 1.

Genome-wide genetic analyses highlight mitogen-activated protein kinase (MAPK) signaling in the pathogenesis of endometriosis.

Hum Reprod. 2017 Apr 1;32(4):780-793. doi: 10.1093/humrep/dex024.

SNP-based pathway enrichment analysis for genome-wide association studies.

BMC Bioinformatics. 2011 Apr 15;12:99. doi: 10.1186/1471-2105-12-99.

Integrating genetic and gene expression evidence into genome-wide association analysis of gene sets.

Genome Res. 2012 Feb;22(2):386-97. doi: 10.1101/gr.124370.111. Epub 2011 Sep 22.

Pathway analysis supports association of nonsyndromic cryptorchidism with genetic loci linked to cytoskeleton-dependent functions.

Hum Reprod. 2015 Oct;30(10):2439-51. doi: 10.1093/humrep/dev180. Epub 2015 Jul 24.

Folic acid supplementation and malaria susceptibility and severity among people taking antifolate antimalarial drugs in endemic areas.

Cochrane Database Syst Rev. 2022 Feb 1;2(2022):CD014217. doi: 10.1002/14651858.CD014217.

Haplotype function score improves biological interpretation and cross-ancestry polygenic prediction of human complex traits.

Elife. 2024 Apr 19;12:RP92574. doi: 10.7554/eLife.92574.

Integrated enrichment analysis of variants and pathways in genome-wide association studies indicates central role for IL-2 signaling genes in type 1 diabetes, and cytokine signaling genes in Crohn's disease.

PLoS Genet. 2013;9(10):e1003770. doi: 10.1371/journal.pgen.1003770. Epub 2013 Oct 3.

Type 2 diabetes genetic loci informed by multi-trait associations point to disease mechanisms and subtypes: A soft clustering analysis.

PLoS Med. 2018 Sep 21;15(9):e1002654. doi: 10.1371/journal.pmed.1002654. eCollection 2018 Sep.

High-throughput genetic clustering of type 2 diabetes loci reveals heterogeneous mechanistic pathways of metabolic disease.

Diabetologia. 2023 Mar;66(3):495-507. doi: 10.1007/s00125-022-05848-6. Epub 2022 Dec 20.

引用本文的文献

Time is ticking faster for long genes in aging.

Trends Genet. 2024 Apr;40(4):299-312. doi: 10.1016/j.tig.2024.01.009. Epub 2024 Mar 21.

Genome-Wide Gene-Set Analysis Approaches in Amyotrophic Lateral Sclerosis.

J Pers Med. 2022 Nov 20;12(11):1932. doi: 10.3390/jpm12111932.

PaintOmics 4: new tools for the integrative analysis of multi-omics datasets supported by multiple pathway databases.

Nucleic Acids Res. 2022 Jul 5;50(W1):W551-W559. doi: 10.1093/nar/gkac352.

Identification of Atrial Fibrillation-Associated Genes and Using Genome-Wide Association and Transcriptome Expression Profile Data on Left-Right Atrial Appendages.

Front Genet. 2021 Jun 30;12:696591. doi: 10.3389/fgene.2021.696591. eCollection 2021.

Pathway analysis for genome-wide genetic variation data: Analytic principles, latest developments, and new opportunities.

J Genet Genomics. 2021 Mar 20;48(3):173-183. doi: 10.1016/j.jgg.2021.01.007. Epub 2021 Feb 26.

Powerful gene set analysis in GWAS with the Generalized Berk-Jones statistic.

PLoS Genet. 2019 Mar 15;15(3):e1007530. doi: 10.1371/journal.pgen.1007530. eCollection 2019 Mar.

Bioinformatics Analysis of Key Genes and Pathways for Medulloblastoma as a Therapeutic Target.

Asian Pac J Cancer Prev. 2019 Jan 25;20(1):221-227. doi: 10.31557/APJCP.2019.20.1.221.

Strategies for Pathway Analysis Using GWAS and WGS Data.

Curr Protoc Hum Genet. 2019 Jan;100(1):e79. doi: 10.1002/cphg.79. Epub 2018 Nov 2.

An Exome-Wide Association Study Identifies New Susceptibility Loci for Age of Smoking Initiation in African- and European-American Populations.

Nicotine Tob Res. 2019 May 21;21(6):707-713. doi: 10.1093/ntr/ntx262.

Improving the detection of pathways in genome-wide association studies by combined effects of SNPs from Linkage Disequilibrium blocks.

Sci Rep. 2017 Jun 14;7(1):3512. doi: 10.1038/s41598-017-03826-2.

本文引用的文献

Genome-wide association study of smoking initiation and current smoking.

Am J Hum Genet. 2009 Mar;84(3):367-79. doi: 10.1016/j.ajhg.2009.02.001. Epub 2009 Mar 5.

Interrogating type 2 diabetes genome-wide association data using a biological pathway-based approach.

Diabetes. 2009 Jun;58(6):1463-7. doi: 10.2337/db08-1378. Epub 2009 Feb 27.

Diverse genome-wide association studies associate the IL12/IL23 pathway with Crohn Disease.

Am J Hum Genet. 2009 Mar;84(3):399-405. doi: 10.1016/j.ajhg.2009.01.026. Epub 2009 Feb 26.

Loci influencing lipid levels and coronary heart disease risk in 16 European population cohorts.

Nat Genet. 2009 Jan;41(1):47-55. doi: 10.1038/ng.269. Epub 2008 Dec 7.

Pathways-based analyses of whole-genome association study data in bipolar disorder reveal genes mediating ion channel activity and synaptic neurotransmission.

Hum Genet. 2009 Feb;125(1):63-79. doi: 10.1007/s00439-008-0600-y. Epub 2008 Dec 4.

Genome-wide association analysis of susceptibility and clinical phenotype in multiple sclerosis.

Hum Mol Genet. 2009 Feb 15;18(4):767-78. doi: 10.1093/hmg/ddn388. Epub 2008 Nov 14.

SNPLogic: an interactive single nucleotide polymorphism selection, annotation, and prioritization system.

Nucleic Acids Res. 2009 Jan;37(Database issue):D803-9. doi: 10.1093/nar/gkn756. Epub 2008 Nov 4.

Autoimmune diseases: insights from genome-wide association studies.

Hum Mol Genet. 2008 Oct 15;17(R2):R116-21. doi: 10.1093/hmg/ddn246.

Common variants at CD40 and other loci confer risk of rheumatoid arthritis.

Nat Genet. 2008 Oct;40(10):1216-23. doi: 10.1038/ng.233. Epub 2008 Sep 14.

Evidence for two independent prostate cancer risk-associated loci in the HNF1B gene at 17q12.

Nat Genet. 2008 Oct;40(10):1153-5. doi: 10.1038/ng.214. Epub 2008 Aug 31.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

全基因组关联研究中通路富集检测的策略与问题

Strategies and issues in the detection of pathway enrichment in genome-wide association studies.

作者信息

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献