应用压缩感知于全基因组关联研究。

Applying compressed sensing to genome-wide association studies.

机构信息

Mathematical Biology Section, Laboratory of Biological Modeling, National Institute of Diabetes and Digestive and Kidney Diseases, National Institutes of Health, South Drive, Bethesda, MD 20814, USA.

Mathematical Biology Section, Laboratory of Biological Modeling, National Institute of Diabetes and Digestive and Kidney Diseases, National Institutes of Health, South Drive, Bethesda, MD 20814, USA ; Department of Psychology, University of Minnesota Twin Cities, 75 East River Parkway, Minneapolis, MN 55455, USA ; Cognitive Genomics Lab, BGI Shenzhen, Yantian District, Shenzhen, China.

出版信息

Gigascience. 2014 Jun 16;3:10. doi: 10.1186/2047-217X-3-10. eCollection 2014.

DOI:10.1186/2047-217X-3-10

PMID:25002967

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC4078394/

Abstract

BACKGROUND

The aim of a genome-wide association study (GWAS) is to isolate DNA markers for variants affecting phenotypes of interest. This is constrained by the fact that the number of markers often far exceeds the number of samples. Compressed sensing (CS) is a body of theory regarding signal recovery when the number of predictor variables (i.e., genotyped markers) exceeds the sample size. Its applicability to GWAS has not been investigated.

RESULTS

Using CS theory, we show that all markers with nonzero coefficients can be identified (selected) using an efficient algorithm, provided that they are sufficiently few in number (sparse) relative to sample size. For heritability equal to one (h (2) = 1), there is a sharp phase transition from poor performance to complete selection as the sample size is increased. For heritability below one, complete selection still occurs, but the transition is smoothed. We find for h (2) ∼ 0.5 that a sample size of approximately thirty times the number of markers with nonzero coefficients is sufficient for full selection. This boundary is only weakly dependent on the number of genotyped markers.

CONCLUSION

Practical measures of signal recovery are robust to linkage disequilibrium between a true causal variant and markers residing in the same genomic region. Given a limited sample size, it is possible to discover a phase transition by increasing the penalization; in this case a subset of the support may be recovered. Applying this approach to the GWAS analysis of height, we show that 70-100% of the selected markers are strongly correlated with height-associated markers identified by the GIANT Consortium.

摘要

背景

全基因组关联研究（GWAS）的目的是分离影响感兴趣表型的 DNA 标记变体。这受到这样一个事实的限制，即标记的数量通常远远超过样本的数量。压缩感知（CS）是一种关于预测变量（即，基因分型标记）数量超过样本量时信号恢复的理论。它在 GWAS 中的适用性尚未得到研究。

结果

使用 CS 理论，我们表明，只要它们的数量相对于样本量足够少（稀疏），则可以使用有效的算法识别（选择）所有具有非零系数的标记。对于遗传率等于一（h（2）= 1），随着样本量的增加，从性能不佳到完全选择会出现明显的相变。对于遗传率低于一，仍然会发生完全选择，但相变被平滑化。我们发现对于 h（2）≈0.5，大约是具有非零系数的标记数量的三十倍的样本量足以进行完全选择。该边界仅与基因分型标记的数量弱相关。

结论

实际的信号恢复措施对于真实因果变体与位于同一基因组区域中的标记之间的连锁不平衡具有鲁棒性。给定有限的样本量，通过增加惩罚可以发现相变；在这种情况下，可能会恢复支持的子集。将此方法应用于身高的 GWAS 分析，我们表明，选择的标记中有 70-100%与 GIANT 联盟确定的与身高相关的标记强烈相关。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/354c/4078394/46d501b2ba8e/2047-217X-3-10-1.jpg

相似文献

Applying compressed sensing to genome-wide association studies.

Gigascience. 2014 Jun 16;3:10. doi: 10.1186/2047-217X-3-10. eCollection 2014.

Uncovering the Genetic Architectures of Quantitative Traits.

Comput Struct Biotechnol J. 2015 Nov 23;14:28-34. doi: 10.1016/j.csbj.2015.10.002. eCollection 2016.

Determination of nonlinear genetic architecture using compressed sensing.

Gigascience. 2015 Sep 14;4:44. doi: 10.1186/s13742-015-0081-6. eCollection 2015.

Genomic prediction in contrast to a genome-wide association study in explaining heritable variation of complex growth traits in breeding populations of Eucalyptus.

BMC Genomics. 2017 Jul 11;18(1):524. doi: 10.1186/s12864-017-3920-2.

Weighting sequence variants based on their annotation increases the power of genome-wide association studies in dairy cattle.

Genet Sel Evol. 2019 May 10;51(1):20. doi: 10.1186/s12711-019-0463-9.

Accounting for linkage disequilibrium in genome-wide association studies: A penalized regression method.

Stat Interface. 2013 Jan 1;6(1):99-115. doi: 10.4310/SII.2013.v6.n1.a10.

Performance of a blockwise approach in variable selection using linkage disequilibrium information.

BMC Bioinformatics. 2015 May 8;16:148. doi: 10.1186/s12859-015-0556-6.

Two-phase designs to follow-up genome-wide association signals with DNA resequencing studies.

Genet Epidemiol. 2013 Apr;37(3):229-38. doi: 10.1002/gepi.21708. Epub 2013 Jan 24.

A compressed-sensing-based compressor for ECG.

Biomed Eng Lett. 2020 Feb 6;10(2):299-307. doi: 10.1007/s13534-020-00148-7. eCollection 2020 May.

EMPIRICAL AVERAGE-CASE RELATION BETWEEN UNDERSAMPLING AND SPARSITY IN X-RAY CT.

Inverse Probl Imaging (Springfield). 2015 May;9(2):431-446. doi: 10.3934/ipi.2015.9.431.

引用本文的文献

Biobank-scale methods and projections for sparse polygenic prediction from machine learning.

Sci Rep. 2023 Jul 19;13(1):11662. doi: 10.1038/s41598-023-37580-5.

Human genotype-to-phenotype predictions: Boosting accuracy with nonlinear models.

PLoS One. 2022 Aug 31;17(8):e0273293. doi: 10.1371/journal.pone.0273293. eCollection 2022.

Machine Learning Prediction of Biomarkers from SNPs and of Disease Risk from Biomarkers in the UK Biobank.

Genes (Basel). 2021 Jun 29;12(7):991. doi: 10.3390/genes12070991.

Sibling validation of polygenic risk scores and complex trait prediction.

Sci Rep. 2020 Aug 6;10(1):13190. doi: 10.1038/s41598-020-69927-7.

Genetic architecture of complex traits and disease risk predictors.

Sci Rep. 2020 Jul 21;10(1):12055. doi: 10.1038/s41598-020-68881-8.

Iterative hard thresholding in genome-wide association studies: Generalized linear models, prior weights, and double sparsity.

Gigascience. 2020 Jun 1;9(6). doi: 10.1093/gigascience/giaa044.

Genomic Prediction of 16 Complex Disease Risks Including Heart Attack, Diabetes, Breast and Prostate Cancer.

Sci Rep. 2019 Oct 25;9(1):15286. doi: 10.1038/s41598-019-51258-x.

Accurate Genomic Prediction of Human Height.

Genetics. 2018 Oct;210(2):477-497. doi: 10.1534/genetics.118.301267. Epub 2018 Aug 27.

Uncovering the Genetic Architectures of Quantitative Traits.

Comput Struct Biotechnol J. 2015 Nov 23;14:28-34. doi: 10.1016/j.csbj.2015.10.002. eCollection 2016.

Determination of nonlinear genetic architecture using compressed sensing.

Gigascience. 2015 Sep 14;4:44. doi: 10.1186/s13742-015-0081-6. eCollection 2015.

本文引用的文献

Accelerating improvement of livestock with genomic selection.

Annu Rev Anim Biosci. 2013 Jan;1:221-37. doi: 10.1146/annurev-animal-031412-103705. Epub 2013 Jan 1.

Conditions for the validity of SNP-based heritability estimation.

Hum Genet. 2014 Aug;133(8):1011-22. doi: 10.1007/s00439-014-1441-5. Epub 2014 Apr 18.

Beyond GWASs: illuminating the dark road from association to function.

Am J Hum Genet. 2013 Nov 7;93(5):779-97. doi: 10.1016/j.ajhg.2013.10.012.

Genome-wide association analysis identifies 13 new risk loci for schizophrenia.

Nat Genet. 2013 Oct;45(10):1150-9. doi: 10.1038/ng.2742. Epub 2013 Aug 25.

Genome-wide prediction of traits with different genetic architecture through efficient variable selection.

Genetics. 2013 Oct;195(2):573-87. doi: 10.1534/genetics.113.150078. Epub 2013 Aug 9.

Priors in whole-genome regression: the bayesian alphabet returns.

Genetics. 2013 Jul;194(3):573-96. doi: 10.1534/genetics.113.151753. Epub 2013 May 1.

Projecting the performance of risk prediction based on polygenic analyses of genome-wide association studies.

Nat Genet. 2013 Apr;45(4):400-5, 405e1-3. doi: 10.1038/ng.2579. Epub 2013 Mar 3.

Polygenic modeling with bayesian sparse linear mixed models.

PLoS Genet. 2013;9(2):e1003264. doi: 10.1371/journal.pgen.1003264. Epub 2013 Feb 7.

Deterministic matrices matching the compressed sensing phase transitions of Gaussian random matrices.

Proc Natl Acad Sci U S A. 2013 Jan 22;110(4):1181-6. doi: 10.1073/pnas.1219540110. Epub 2012 Dec 31.

Performance and robustness of penalized and unpenalized methods for genetic prediction of complex human disease.

Genet Epidemiol. 2013 Feb;37(2):184-95. doi: 10.1002/gepi.21698. Epub 2012 Nov 30.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

应用压缩感知于全基因组关联研究。

Applying compressed sensing to genome-wide association studies.

机构信息

Mathematical Biology Section, Laboratory of Biological Modeling, National Institute of Diabetes and Digestive and Kidney Diseases, National Institutes of Health, South Drive, Bethesda, MD 20814, USA.