基于 SNP 的通路富集分析在全基因组关联研究中的应用。

SNP-based pathway enrichment analysis for genome-wide association studies.

机构信息

Department of Computer Science, University of California, Irvine, USA.

出版信息

BMC Bioinformatics. 2011 Apr 15;12:99. doi: 10.1186/1471-2105-12-99.

DOI:10.1186/1471-2105-12-99

PMID:21496265

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC3102637/

Abstract

BACKGROUND

Recently we have witnessed a surge of interest in using genome-wide association studies (GWAS) to discover the genetic basis of complex diseases. Many genetic variations, mostly in the form of single nucleotide polymorphisms (SNPs), have been identified in a wide spectrum of diseases, including diabetes, cancer, and psychiatric diseases. A common theme arising from these studies is that the genetic variations discovered by GWAS can only explain a small fraction of the genetic risks associated with the complex diseases. New strategies and statistical approaches are needed to address this lack of explanation. One such approach is the pathway analysis, which considers the genetic variations underlying a biological pathway, rather than separately as in the traditional GWAS studies. A critical challenge in the pathway analysis is how to combine evidences of association over multiple SNPs within a gene and multiple genes within a pathway. Most current methods choose the most significant SNP from each gene as a representative, ignoring the joint action of multiple SNPs within a gene. This approach leads to preferential identification of genes with a greater number of SNPs.

RESULTS

We describe a SNP-based pathway enrichment method for GWAS studies. The method consists of the following two main steps: 1) for a given pathway, using an adaptive truncated product statistic to identify all representative (potentially more than one) SNPs of each gene, calculating the average number of representative SNPs for the genes, then re-selecting the representative SNPs of genes in the pathway based on this number; and 2) ranking all selected SNPs by the significance of their statistical association with a trait of interest, and testing if the set of SNPs from a particular pathway is significantly enriched with high ranks using a weighted Kolmogorov-Smirnov test. We applied our method to two large genetically distinct GWAS data sets of schizophrenia, one from European-American (EA) and the other from African-American (AA). In the EA data set, we found 22 pathways with nominal P-value less than or equal to 0.001 and corresponding false discovery rate (FDR) less than 5%. In the AA data set, we found 11 pathways by controlling the same nominal P-value and FDR threshold. Interestingly, 8 of these pathways overlap with those found in the EA sample. We have implemented our method in a JAVA software package, called SNP Set Enrichment Analysis (SSEA), which contains a user-friendly interface and is freely available at http://cbcl.ics.uci.edu/SSEA.

CONCLUSIONS

The SNP-based pathway enrichment method described here offers a new alternative approach for analysing GWAS data. By applying it to schizophrenia GWAS studies, we show that our method is able to identify statistically significant pathways, and importantly, pathways that can be replicated in large genetically distinct samples.

摘要

背景

最近，我们见证了利用全基因组关联研究（GWAS）发现复杂疾病遗传基础的兴趣激增。在包括糖尿病、癌症和精神疾病在内的广泛疾病中，已经发现了许多遗传变异，主要以单核苷酸多态性（SNP）的形式存在。这些研究提出的一个共同主题是，GWAS 发现的遗传变异只能解释与复杂疾病相关的遗传风险的一小部分。需要新的策略和统计方法来解决这一解释不足的问题。一种方法是途径分析，它考虑了生物途径下的遗传变异，而不是像传统的 GWAS 研究那样分别考虑。途径分析中的一个关键挑战是如何组合一个基因内多个 SNP 和一个途径内多个基因的关联证据。目前大多数方法选择每个基因中最显著的 SNP 作为代表，忽略了一个基因内多个 SNP 的共同作用。这种方法导致更倾向于鉴定具有更多 SNP 的基因。

结果

我们描述了一种用于 GWAS 研究的基于 SNP 的途径富集方法。该方法包括以下两个主要步骤：1）对于给定的途径，使用自适应截断乘积统计量识别每个基因的所有代表性（可能超过一个）SNP，计算基因的代表性 SNP 的平均数量，然后根据该数量重新选择途径中的基因的代表性 SNP；2）根据与感兴趣的性状的统计关联的显著性对所有选定的 SNP 进行排序，并使用加权的 Kolmogorov-Smirnov 检验来检验特定途径的 SNP 集合是否显著富集了高秩。我们将我们的方法应用于两个来自欧洲裔美国人（EA）和非裔美国人（AA）的大型遗传上不同的精神分裂症 GWAS 数据集。在 EA 数据集，我们发现了 22 个具有名义 P 值小于或等于 0.001 和相应的错误发现率（FDR）小于 5%的途径。在 AA 数据集，我们通过控制相同的名义 P 值和 FDR 阈值找到了 11 个途径。有趣的是，其中 8 个途径与 EA 样本中的途径重叠。我们已经在一个名为 SNP Set Enrichment Analysis（SSEA）的 Java 软件包中实现了我们的方法，它包含一个用户友好的界面，并可在 http://cbcl.ics.uci.edu/SSEA 上免费获得。

结论

这里描述的基于 SNP 的途径富集方法为分析 GWAS 数据提供了一种新的替代方法。通过将其应用于精神分裂症 GWAS 研究，我们表明我们的方法能够识别具有统计学意义的途径，并且重要的是，能够在遗传上不同的大样本中复制的途径。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/76c3/3102637/f02809170782/1471-2105-12-99-1.jpg

相似文献

SNP-based pathway enrichment analysis for genome-wide association studies.

BMC Bioinformatics. 2011 Apr 15;12:99. doi: 10.1186/1471-2105-12-99.

Genome-wide genetic analyses highlight mitogen-activated protein kinase (MAPK) signaling in the pathogenesis of endometriosis.

Hum Reprod. 2017 Apr 1;32(4):780-793. doi: 10.1093/humrep/dex024.

All SNPs are not created equal: genome-wide association studies reveal a consistent pattern of enrichment among functionally annotated SNPs.

PLoS Genet. 2013 Apr;9(4):e1003449. doi: 10.1371/journal.pgen.1003449. Epub 2013 Apr 25.

Pathway analysis of a genome-wide association study in schizophrenia.

Gene. 2013 Aug 1;525(1):107-15. doi: 10.1016/j.gene.2013.04.014. Epub 2013 May 1.

Pathway analysis of genome-wide association studies for Parkinson's disease.

Mol Biol Rep. 2013 Mar;40(3):2599-607. doi: 10.1007/s11033-012-2346-9. Epub 2012 Dec 13.

SNPranker 2.0: a gene-centric data mining tool for diseases associated SNP prioritization in GWAS.

BMC Bioinformatics. 2013;14 Suppl 1(Suppl 1):S9. doi: 10.1186/1471-2105-14-S1-S9. Epub 2013 Jan 14.

Uncovering networks from genome-wide association studies via circular genomic permutation.

G3 (Bethesda). 2012 Sep;2(9):1067-75. doi: 10.1534/g3.112.002618. Epub 2012 Sep 1.

Pathway analysis of genome-wide association study for bone mineral density.

Mol Biol Rep. 2012 Aug;39(8):8099-106. doi: 10.1007/s11033-012-1657-1. Epub 2012 Apr 25.

Ion channels and schizophrenia: a gene set-based analytic approach to GWAS data for biological hypothesis testing.

Hum Genet. 2012 Mar;131(3):373-91. doi: 10.1007/s00439-011-1082-x. Epub 2011 Aug 25.

Genetic overlap analysis of endometriosis and asthma identifies shared loci implicating sex hormones and thyroid signalling pathways.

Hum Reprod. 2022 Jan 28;37(2):366-383. doi: 10.1093/humrep/deab254.

引用本文的文献

metacp: a versatile software package for combining dependent or independent p-values.

BMC Bioinformatics. 2025 Apr 19;26(1):109. doi: 10.1186/s12859-025-06126-z.

Identification of Missense Variants Affecting Carcass Traits for Hanwoo Precision Breeding.

Genes (Basel). 2023 Sep 22;14(10):1839. doi: 10.3390/genes14101839.

A bioinformatics approach towards bronchopulmonary dysplasia.

Transl Pediatr. 2023 Jun 30;12(6):1213-1224. doi: 10.21037/tp-23-133. Epub 2023 Jun 19.

Performance Comparison of Computational Methods for the Prediction of the Function and Pathogenicity of Non-coding Variants.

Genomics Proteomics Bioinformatics. 2023 Jun;21(3):649-661. doi: 10.1016/j.gpb.2022.02.002. Epub 2022 Mar 8.

Importance of SNP Dependency Correction and Association Integration for Gene Set Analysis in Genome-Wide Association Studies.

Front Genet. 2021 Dec 9;12:767358. doi: 10.3389/fgene.2021.767358. eCollection 2021.

Integrating Genome-Wide Association Studies and Gene Expression Profiles With Chemical-Genes Interaction Networks to Identify Chemicals Associated With Colorectal Cancer.

Front Genet. 2020 Apr 24;11:385. doi: 10.3389/fgene.2020.00385. eCollection 2020.

PAST: The Pathway Association Studies Tool to Infer Biological Meaning from GWAS Datasets.

Plants (Basel). 2020 Jan 2;9(1):58. doi: 10.3390/plants9010058.

A machine-compiled database of genome-wide association studies.

Nat Commun. 2019 Jul 26;10(1):3341. doi: 10.1038/s41467-019-11026-x.

The Study of Association Between Polymorphism of TNF-α Gene's Promoter Region and Recurrent Pregnancy Loss.

J Reprod Infertil. 2018 Oct-Dec;19(4):211-218.

Chromosome-based gene co-expression analysis reveals regions associated with cancers: chromosome 1 as an example.

Mol Biol Rep. 2019 Apr;46(2):1551-1553. doi: 10.1007/s11033-019-04596-y. Epub 2019 Jan 24.

本文引用的文献

Long-term depression in the CNS.

Nat Rev Neurosci. 2010 Jul;11(7):459-73. doi: 10.1038/nrn2867.

Integrating pathway analysis and genetics of gene expression for genome-wide association studies.

Am J Hum Genet. 2010 Apr 9;86(4):581-91. doi: 10.1016/j.ajhg.2010.02.020. Epub 2010 Mar 25.

Prioritizing GWAS results: A review of statistical methods and recommendations for their application.

Am J Hum Genet. 2010 Jan;86(1):6-22. doi: 10.1016/j.ajhg.2009.11.017.

Genome-wide association and replication studies identify four variants associated with prostate cancer susceptibility.

Nat Genet. 2009 Oct;41(10):1122-6. doi: 10.1038/ng.448. Epub 2009 Sep 20.

The SNP ratio test: pathway analysis of genome-wide association datasets.

Bioinformatics. 2009 Oct 15;25(20):2762-3. doi: 10.1093/bioinformatics/btp448. Epub 2009 Jul 20.

Gene and pathway-based second-wave analysis of genome-wide association studies.

Eur J Hum Genet. 2010 Jan;18(1):111-7. doi: 10.1038/ejhg.2009.115.

Gene ontology analysis of GWA study data sets provides insights into the biology of bipolar disorder.

Am J Hum Genet. 2009 Jul;85(1):13-24. doi: 10.1016/j.ajhg.2009.05.011. Epub 2009 Jun 18.

Complex diseases, complex genes: keeping pathways on the right track.

Epidemiology. 2009 Jul;20(4):508-11. doi: 10.1097/EDE.0b013e3181a93b98.

Strategies and issues in the detection of pathway enrichment in genome-wide association studies.

Hum Genet. 2009 Aug;126(2):289-301. doi: 10.1007/s00439-009-0676-z. Epub 2009 May 1.

Genomewide association studies--illuminating biologic pathways.

N Engl J Med. 2009 Apr 23;360(17):1699-701. doi: 10.1056/NEJMp0808934. Epub 2009 Apr 15.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

基于 SNP 的通路富集分析在全基因组关联研究中的应用。

SNP-based pathway enrichment analysis for genome-wide association studies.

机构信息

出版信息

BACKGROUND

RESULTS

CONCLUSIONS

背景

结果

结论

相似文献

引用本文的文献

本文引用的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献