• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

应用压缩感知于全基因组关联研究。

Applying compressed sensing to genome-wide association studies.

机构信息

Mathematical Biology Section, Laboratory of Biological Modeling, National Institute of Diabetes and Digestive and Kidney Diseases, National Institutes of Health, South Drive, Bethesda, MD 20814, USA.

Mathematical Biology Section, Laboratory of Biological Modeling, National Institute of Diabetes and Digestive and Kidney Diseases, National Institutes of Health, South Drive, Bethesda, MD 20814, USA ; Department of Psychology, University of Minnesota Twin Cities, 75 East River Parkway, Minneapolis, MN 55455, USA ; Cognitive Genomics Lab, BGI Shenzhen, Yantian District, Shenzhen, China.

出版信息

Gigascience. 2014 Jun 16;3:10. doi: 10.1186/2047-217X-3-10. eCollection 2014.

DOI:10.1186/2047-217X-3-10
PMID:25002967
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC4078394/
Abstract

BACKGROUND

The aim of a genome-wide association study (GWAS) is to isolate DNA markers for variants affecting phenotypes of interest. This is constrained by the fact that the number of markers often far exceeds the number of samples. Compressed sensing (CS) is a body of theory regarding signal recovery when the number of predictor variables (i.e., genotyped markers) exceeds the sample size. Its applicability to GWAS has not been investigated.

RESULTS

Using CS theory, we show that all markers with nonzero coefficients can be identified (selected) using an efficient algorithm, provided that they are sufficiently few in number (sparse) relative to sample size. For heritability equal to one (h (2) = 1), there is a sharp phase transition from poor performance to complete selection as the sample size is increased. For heritability below one, complete selection still occurs, but the transition is smoothed. We find for h (2) ∼ 0.5 that a sample size of approximately thirty times the number of markers with nonzero coefficients is sufficient for full selection. This boundary is only weakly dependent on the number of genotyped markers.

CONCLUSION

Practical measures of signal recovery are robust to linkage disequilibrium between a true causal variant and markers residing in the same genomic region. Given a limited sample size, it is possible to discover a phase transition by increasing the penalization; in this case a subset of the support may be recovered. Applying this approach to the GWAS analysis of height, we show that 70-100% of the selected markers are strongly correlated with height-associated markers identified by the GIANT Consortium.

摘要

背景

全基因组关联研究(GWAS)的目的是分离影响感兴趣表型的 DNA 标记变体。这受到这样一个事实的限制,即标记的数量通常远远超过样本的数量。压缩感知(CS)是一种关于预测变量(即,基因分型标记)数量超过样本量时信号恢复的理论。它在 GWAS 中的适用性尚未得到研究。

结果

使用 CS 理论,我们表明,只要它们的数量相对于样本量足够少(稀疏),则可以使用有效的算法识别(选择)所有具有非零系数的标记。对于遗传率等于一(h(2)= 1),随着样本量的增加,从性能不佳到完全选择会出现明显的相变。对于遗传率低于一,仍然会发生完全选择,但相变被平滑化。我们发现对于 h(2)≈0.5,大约是具有非零系数的标记数量的三十倍的样本量足以进行完全选择。该边界仅与基因分型标记的数量弱相关。

结论

实际的信号恢复措施对于真实因果变体与位于同一基因组区域中的标记之间的连锁不平衡具有鲁棒性。给定有限的样本量,通过增加惩罚可以发现相变;在这种情况下,可能会恢复支持的子集。将此方法应用于身高的 GWAS 分析,我们表明,选择的标记中有 70-100%与 GIANT 联盟确定的与身高相关的标记强烈相关。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/354c/4078394/dd3fc46ef463/2047-217X-3-10-11.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/354c/4078394/46d501b2ba8e/2047-217X-3-10-1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/354c/4078394/917c736e1331/2047-217X-3-10-2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/354c/4078394/402dc238273a/2047-217X-3-10-3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/354c/4078394/e19ed2598c01/2047-217X-3-10-4.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/354c/4078394/0018feb7afd1/2047-217X-3-10-5.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/354c/4078394/0236d902b947/2047-217X-3-10-6.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/354c/4078394/4b02d0618b01/2047-217X-3-10-7.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/354c/4078394/e2ce5ff5bb53/2047-217X-3-10-8.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/354c/4078394/8f4330bb093d/2047-217X-3-10-9.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/354c/4078394/c5e76003938b/2047-217X-3-10-10.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/354c/4078394/dd3fc46ef463/2047-217X-3-10-11.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/354c/4078394/46d501b2ba8e/2047-217X-3-10-1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/354c/4078394/917c736e1331/2047-217X-3-10-2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/354c/4078394/402dc238273a/2047-217X-3-10-3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/354c/4078394/e19ed2598c01/2047-217X-3-10-4.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/354c/4078394/0018feb7afd1/2047-217X-3-10-5.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/354c/4078394/0236d902b947/2047-217X-3-10-6.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/354c/4078394/4b02d0618b01/2047-217X-3-10-7.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/354c/4078394/e2ce5ff5bb53/2047-217X-3-10-8.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/354c/4078394/8f4330bb093d/2047-217X-3-10-9.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/354c/4078394/c5e76003938b/2047-217X-3-10-10.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/354c/4078394/dd3fc46ef463/2047-217X-3-10-11.jpg

相似文献

1
Applying compressed sensing to genome-wide association studies.应用压缩感知于全基因组关联研究。
Gigascience. 2014 Jun 16;3:10. doi: 10.1186/2047-217X-3-10. eCollection 2014.
2
Uncovering the Genetic Architectures of Quantitative Traits.揭示数量性状的遗传结构。
Comput Struct Biotechnol J. 2015 Nov 23;14:28-34. doi: 10.1016/j.csbj.2015.10.002. eCollection 2016.
3
Determination of nonlinear genetic architecture using compressed sensing.利用压缩感知确定非线性遗传结构
Gigascience. 2015 Sep 14;4:44. doi: 10.1186/s13742-015-0081-6. eCollection 2015.
4
Genomic prediction in contrast to a genome-wide association study in explaining heritable variation of complex growth traits in breeding populations of Eucalyptus.与全基因组关联研究相比,基因组预测在解释桉树育种群体复杂生长性状的遗传变异方面的作用。
BMC Genomics. 2017 Jul 11;18(1):524. doi: 10.1186/s12864-017-3920-2.
5
Weighting sequence variants based on their annotation increases the power of genome-wide association studies in dairy cattle.基于注释对序列变异进行加权可提高奶牛全基因组关联研究的效力。
Genet Sel Evol. 2019 May 10;51(1):20. doi: 10.1186/s12711-019-0463-9.
6
Accounting for linkage disequilibrium in genome-wide association studies: A penalized regression method.全基因组关联研究中连锁不平衡的考量:一种惩罚回归方法。
Stat Interface. 2013 Jan 1;6(1):99-115. doi: 10.4310/SII.2013.v6.n1.a10.
7
Performance of a blockwise approach in variable selection using linkage disequilibrium information.使用连锁不平衡信息进行变量选择时的分块方法性能。
BMC Bioinformatics. 2015 May 8;16:148. doi: 10.1186/s12859-015-0556-6.
8
Two-phase designs to follow-up genome-wide association signals with DNA resequencing studies.采用两阶段设计对全基因组关联信号进行 DNA 重测序研究的随访。
Genet Epidemiol. 2013 Apr;37(3):229-38. doi: 10.1002/gepi.21708. Epub 2013 Jan 24.
9
A compressed-sensing-based compressor for ECG.一种基于压缩感知的心电图压缩器。
Biomed Eng Lett. 2020 Feb 6;10(2):299-307. doi: 10.1007/s13534-020-00148-7. eCollection 2020 May.
10
EMPIRICAL AVERAGE-CASE RELATION BETWEEN UNDERSAMPLING AND SPARSITY IN X-RAY CT.X射线计算机断层扫描中欠采样与稀疏性之间的经验平均情况关系
Inverse Probl Imaging (Springfield). 2015 May;9(2):431-446. doi: 10.3934/ipi.2015.9.431.

引用本文的文献

1
Biobank-scale methods and projections for sparse polygenic prediction from machine learning.基于机器学习的稀疏多基因预测的生物银行规模方法和预测。
Sci Rep. 2023 Jul 19;13(1):11662. doi: 10.1038/s41598-023-37580-5.
2
Human genotype-to-phenotype predictions: Boosting accuracy with nonlinear models.人类基因型到表型的预测:利用非线性模型提高准确性。
PLoS One. 2022 Aug 31;17(8):e0273293. doi: 10.1371/journal.pone.0273293. eCollection 2022.
3
Machine Learning Prediction of Biomarkers from SNPs and of Disease Risk from Biomarkers in the UK Biobank.

本文引用的文献

1
Accelerating improvement of livestock with genomic selection.利用基因组选择加速家畜改良。
Annu Rev Anim Biosci. 2013 Jan;1:221-37. doi: 10.1146/annurev-animal-031412-103705. Epub 2013 Jan 1.
2
Conditions for the validity of SNP-based heritability estimation.基于单核苷酸多态性(SNP)的遗传力估计有效性的条件。
Hum Genet. 2014 Aug;133(8):1011-22. doi: 10.1007/s00439-014-1441-5. Epub 2014 Apr 18.
3
Beyond GWASs: illuminating the dark road from association to function.超越 GWASs:从关联到功能照亮黑暗之路。
基于 UK Biobank 的 SNP 生物标志物与疾病风险生物标志物的机器学习预测。
Genes (Basel). 2021 Jun 29;12(7):991. doi: 10.3390/genes12070991.
4
Sibling validation of polygenic risk scores and complex trait prediction.多基因风险评分和复杂性状预测的同胞验证。
Sci Rep. 2020 Aug 6;10(1):13190. doi: 10.1038/s41598-020-69927-7.
5
Genetic architecture of complex traits and disease risk predictors.复杂性状和疾病风险预测因子的遗传结构。
Sci Rep. 2020 Jul 21;10(1):12055. doi: 10.1038/s41598-020-68881-8.
6
Iterative hard thresholding in genome-wide association studies: Generalized linear models, prior weights, and double sparsity.全基因组关联研究中的迭代硬阈值法:广义线性模型、先验权重和双重稀疏性。
Gigascience. 2020 Jun 1;9(6). doi: 10.1093/gigascience/giaa044.
7
Genomic Prediction of 16 Complex Disease Risks Including Heart Attack, Diabetes, Breast and Prostate Cancer.16 种复杂疾病风险的基因组预测,包括心脏病发作、糖尿病、乳腺癌和前列腺癌。
Sci Rep. 2019 Oct 25;9(1):15286. doi: 10.1038/s41598-019-51258-x.
8
Accurate Genomic Prediction of Human Height.人类身高的精确基因组预测。
Genetics. 2018 Oct;210(2):477-497. doi: 10.1534/genetics.118.301267. Epub 2018 Aug 27.
9
Uncovering the Genetic Architectures of Quantitative Traits.揭示数量性状的遗传结构。
Comput Struct Biotechnol J. 2015 Nov 23;14:28-34. doi: 10.1016/j.csbj.2015.10.002. eCollection 2016.
10
Determination of nonlinear genetic architecture using compressed sensing.利用压缩感知确定非线性遗传结构
Gigascience. 2015 Sep 14;4:44. doi: 10.1186/s13742-015-0081-6. eCollection 2015.
Am J Hum Genet. 2013 Nov 7;93(5):779-97. doi: 10.1016/j.ajhg.2013.10.012.
4
Genome-wide association analysis identifies 13 new risk loci for schizophrenia.全基因组关联分析确定了 13 个精神分裂症的新风险位点。
Nat Genet. 2013 Oct;45(10):1150-9. doi: 10.1038/ng.2742. Epub 2013 Aug 25.
5
Genome-wide prediction of traits with different genetic architecture through efficient variable selection.通过有效的变量选择对具有不同遗传结构的性状进行全基因组预测。
Genetics. 2013 Oct;195(2):573-87. doi: 10.1534/genetics.113.150078. Epub 2013 Aug 9.
6
Priors in whole-genome regression: the bayesian alphabet returns.全基因组回归中的先验信息:贝叶斯字母表回归。
Genetics. 2013 Jul;194(3):573-96. doi: 10.1534/genetics.113.151753. Epub 2013 May 1.
7
Projecting the performance of risk prediction based on polygenic analyses of genome-wide association studies.基于全基因组关联研究的多基因分析预测风险的性能。
Nat Genet. 2013 Apr;45(4):400-5, 405e1-3. doi: 10.1038/ng.2579. Epub 2013 Mar 3.
8
Polygenic modeling with bayesian sparse linear mixed models.贝叶斯稀疏线性混合模型的多基因建模。
PLoS Genet. 2013;9(2):e1003264. doi: 10.1371/journal.pgen.1003264. Epub 2013 Feb 7.
9
Deterministic matrices matching the compressed sensing phase transitions of Gaussian random matrices.确定性矩阵与高斯随机矩阵的压缩感知相变匹配。
Proc Natl Acad Sci U S A. 2013 Jan 22;110(4):1181-6. doi: 10.1073/pnas.1219540110. Epub 2012 Dec 31.
10
Performance and robustness of penalized and unpenalized methods for genetic prediction of complex human disease.惩罚和非惩罚方法在人类复杂疾病遗传预测中的性能和稳健性。
Genet Epidemiol. 2013 Feb;37(2):184-95. doi: 10.1002/gepi.21698. Epub 2012 Nov 30.