Suppr超能文献

从全基因组关联数据中找到 2 型糖尿病因果单核苷酸多态性组合和功能模块。

Finding type 2 diabetes causal single nucleotide polymorphism combinations and functional modules from genome-wide association data.

机构信息

Department of Bio and Brain Engineering, KAIST, Daejeon 305-701, South Korea.

出版信息

BMC Med Inform Decis Mak. 2013;13 Suppl 1(Suppl 1):S3. doi: 10.1186/1472-6947-13-S1-S3. Epub 2013 Apr 5.

Abstract

BACKGROUND

Due to the low statistical power of individual markers from a genome-wide association study (GWAS), detecting causal single nucleotide polymorphisms (SNPs) for complex diseases is a challenge. SNP combinations are suggested to compensate for the low statistical power of individual markers, but SNP combinations from GWAS generate high computational complexity.

METHODS

We aim to detect type 2 diabetes (T2D) causal SNP combinations from a GWAS dataset with optimal filtration and to discover the biological meaning of the detected SNP combinations. Optimal filtration can enhance the statistical power of SNP combinations by comparing the error rates of SNP combinations from various Bonferroni thresholds and p-value range-based thresholds combined with linkage disequilibrium (LD) pruning. T2D causal SNP combinations are selected using random forests with variable selection from an optimal SNP dataset. T2D causal SNP combinations and genome-wide SNPs are mapped into functional modules using expanded gene set enrichment analysis (GSEA) considering pathway, transcription factor (TF)-target, miRNA-target, gene ontology, and protein complex functional modules. The prediction error rates are measured for SNP sets from functional module-based filtration that selects SNPs within functional modules from genome-wide SNPs based expanded GSEA.

RESULTS

A T2D causal SNP combination containing 101 SNPs from the Wellcome Trust Case Control Consortium (WTCCC) GWAS dataset are selected using optimal filtration criteria, with an error rate of 10.25%. Matching 101 SNPs with known T2D genes and functional modules reveals the relationships between T2D and SNP combinations. The prediction error rates of SNP sets from functional module-based filtration record no significance compared to the prediction error rates of randomly selected SNP sets and T2D causal SNP combinations from optimal filtration.

CONCLUSIONS

We propose a detection method for complex disease causal SNP combinations from an optimal SNP dataset by using random forests with variable selection. Mapping the biological meanings of detected SNP combinations can help uncover complex disease mechanisms.

摘要

背景

由于全基因组关联研究(GWAS)中单个标记的统计功效较低,因此检测复杂疾病的因果单核苷酸多态性(SNP)是一项挑战。有人提出 SNP 组合可以弥补单个标记的低统计功效,但 GWAS 中的 SNP 组合会产生高计算复杂度。

方法

我们旨在通过最优过滤从 GWAS 数据集中检测 2 型糖尿病(T2D)因果 SNP 组合,并发现检测到的 SNP 组合的生物学意义。最优过滤可以通过比较基于各种 Bonferroni 阈值和 p 值范围的 SNP 组合的错误率以及结合连锁不平衡(LD)修剪的 SNP 组合来增强 SNP 组合的统计功效。使用随机森林对最优 SNP 数据集进行变量选择,从 T2D 因果 SNP 组合中选择 T2D 因果 SNP 组合。使用扩展基因集富集分析(GSEA)将 T2D 因果 SNP 组合和全基因组 SNP 映射到功能模块中,同时考虑途径、转录因子(TF)-靶标、miRNA-靶标、基因本体论和蛋白质复合物功能模块。基于功能模块过滤的 SNP 集合的预测错误率是根据扩展 GSEA 从全基因组 SNP 中选择功能模块内的 SNP 来衡量的。

结果

使用最优过滤标准,从惠康信托基金会病例对照研究(WTCCC)GWAS 数据集选择了包含 101 个 SNP 的 T2D 因果 SNP 组合,其错误率为 10.25%。将 101 个 SNP 与已知的 T2D 基因和功能模块匹配,揭示了 T2D 与 SNP 组合之间的关系。基于功能模块过滤的 SNP 集合的预测错误率与随机选择的 SNP 集合和最优过滤的 T2D 因果 SNP 组合的预测错误率相比没有显著差异。

结论

我们提出了一种使用随机森林变量选择从最优 SNP 数据集中检测复杂疾病因果 SNP 组合的方法。映射检测到的 SNP 组合的生物学意义有助于揭示复杂疾病的机制。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d8e9/3618247/c74c5dcdf9ba/1472-6947-13-S1-S3-1.jpg

相似文献

1
Finding type 2 diabetes causal single nucleotide polymorphism combinations and functional modules from genome-wide association data.
BMC Med Inform Decis Mak. 2013;13 Suppl 1(Suppl 1):S3. doi: 10.1186/1472-6947-13-S1-S3. Epub 2013 Apr 5.
2
Performance of epistasis detection methods in semi-simulated GWAS.
BMC Bioinformatics. 2018 Jun 18;19(1):231. doi: 10.1186/s12859-018-2229-8.
3
Shared genetic etiology underlying Alzheimer's disease and type 2 diabetes.
Mol Aspects Med. 2015 Jun-Oct;43-44:66-76. doi: 10.1016/j.mam.2015.06.006. Epub 2015 Jun 23.
4
Computational analyses of type 2 diabetes-associated loci identified by genome-wide association studies.
J Diabetes. 2017 Apr;9(4):362-377. doi: 10.1111/1753-0407.12421. Epub 2016 Jul 27.
5
Genome-wide association data classification and SNPs selection using two-stage quality-based Random Forests.
BMC Genomics. 2015;16 Suppl 2(Suppl 2):S5. doi: 10.1186/1471-2164-16-S2-S5. Epub 2015 Jan 21.
6
SNP-based pathway enrichment analysis for genome-wide association studies.
BMC Bioinformatics. 2011 Apr 15;12:99. doi: 10.1186/1471-2105-12-99.
7
Association test based on SNP set: logistic kernel machine based test vs. principal component analysis.
PLoS One. 2012;7(9):e44978. doi: 10.1371/journal.pone.0044978. Epub 2012 Sep 13.
8
GWA-based pleiotropic analysis identified potential SNPs and genes related to type 2 diabetes and obesity.
J Hum Genet. 2021 Mar;66(3):297-306. doi: 10.1038/s10038-020-00843-4. Epub 2020 Sep 18.
9
Using genome-wide pathway analysis to unravel the etiology of complex diseases.
Genet Epidemiol. 2009 Jul;33(5):419-31. doi: 10.1002/gepi.20395.
10
Increasing power of genome-wide association studies by collecting additional single-nucleotide polymorphisms.
Genetics. 2011 Jun;188(2):449-60. doi: 10.1534/genetics.111.128595. Epub 2011 Apr 5.

引用本文的文献

1
Detection and analysis of disease-associated single nucleotide polymorphism influencing post-translational modification.
BMC Med Genomics. 2015;8 Suppl 2(Suppl 2):S7. doi: 10.1186/1755-8794-8-S2-S7. Epub 2015 May 29.
2
Epigenomes: the missing heritability in human cardiovascular disease?
Proteomics Clin Appl. 2014 Aug;8(7-8):480-7. doi: 10.1002/prca.201400031.

本文引用的文献

1
Gene selection and classification for cancer microarray data based on machine learning and similarity measures.
BMC Genomics. 2011 Dec 23;12 Suppl 5(Suppl 5):S1. doi: 10.1186/1471-2164-12-S5-S1.
2
Large-scale gene-centric meta-analysis across 39 studies identifies type 2 diabetes loci.
Am J Hum Genet. 2012 Mar 9;90(3):410-25. doi: 10.1016/j.ajhg.2011.12.022. Epub 2012 Feb 9.
3
SNPInterForest: a new method for detecting epistatic interactions.
BMC Bioinformatics. 2011 Dec 12;12:469. doi: 10.1186/1471-2105-12-469.
4
UniPathway: a resource for the exploration and annotation of metabolic pathways.
Nucleic Acids Res. 2012 Jan;40(Database issue):D761-9. doi: 10.1093/nar/gkr1023. Epub 2011 Nov 18.
5
Power of data mining methods to detect genetic associations and interactions.
Hum Hered. 2011;72(2):85-97. doi: 10.1159/000330579. Epub 2011 Sep 17.
6
Genetic risk prediction in complex disease.
Hum Mol Genet. 2011 Oct 15;20(R2):R182-8. doi: 10.1093/hmg/ddr378. Epub 2011 Aug 25.
8
Genetics of type 2 diabetes: the GWAS era and future perspectives [Review].
Endocr J. 2011;58(9):723-39. doi: 10.1507/endocrj.ej11-0113. Epub 2011 Jul 20.
9
Gene set analysis of genome-wide association studies: methodological issues and perspectives.
Genomics. 2011 Jul;98(1):1-8. doi: 10.1016/j.ygeno.2011.04.006. Epub 2011 Apr 30.
10
SNP-based pathway enrichment analysis for genome-wide association studies.
BMC Bioinformatics. 2011 Apr 15;12:99. doi: 10.1186/1471-2105-12-99.

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验