利用压缩感知确定非线性遗传结构

Determination of nonlinear genetic architecture using compressed sensing.

作者信息

Ho Chiu Man, Hsu Stephen D H

机构信息

Department of Physics and Astronomy, Michigan State University, 567 Wilson Road, East Lansing, 48824 MI USA.

出版信息

Gigascience. 2015 Sep 14;4:44. doi: 10.1186/s13742-015-0081-6. eCollection 2015.

DOI:10.1186/s13742-015-0081-6

PMID:26380078

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC4570224/

Abstract

BACKGROUND

One of the fundamental problems of modern genomics is to extract the genetic architecture of a complex trait from a data set of individual genotypes and trait values. Establishing this important connection between genotype and phenotype is complicated by the large number of candidate genes, the potentially large number of causal loci, and the likely presence of some nonlinear interactions between different genes. Compressed Sensing methods obtain solutions to under-constrained systems of linear equations. These methods can be applied to the problem of determining the best model relating genotype to phenotype, and generally deliver better performance than simply regressing the phenotype against each genetic variant, one at a time. We introduce a Compressed Sensing method that can reconstruct nonlinear genetic models (i.e., including epistasis, or gene-gene interactions) from phenotype-genotype (GWAS) data. Our method uses L1-penalized regression applied to nonlinear functions of the sensing matrix.

RESULTS

The computational and data resource requirements for our method are similar to those necessary for reconstruction of linear genetic models (or identification of gene-trait associations), assuming a condition of generalized sparsity, which limits the total number of gene-gene interactions. An example of a sparse nonlinear model is one in which a typical locus interacts with several or even many others, but only a small subset of all possible interactions exist. It seems plausible that most genetic architectures fall in this category. We give theoretical arguments suggesting that the method is nearly optimal in performance, and demonstrate its effectiveness on broad classes of nonlinear genetic models using simulated human genomes and the small amount of currently available real data. A phase transition (i.e., dramatic and qualitative change) in the behavior of the algorithm indicates when sufficient data is available for its successful application.

CONCLUSION

Our results indicate that predictive models for many complex traits, including a variety of human disease susceptibilities (e.g., with additive heritability h (2)∼0.5), can be extracted from data sets comprised of n ⋆∼100s individuals, where s is the number of distinct causal variants influencing the trait. For example, given a trait controlled by ∼10 k loci, roughly a million individuals would be sufficient for application of the method.

摘要

背景

现代基因组学的基本问题之一是从个体基因型和性状值的数据集中提取复杂性状的遗传结构。由于候选基因数量众多、潜在的因果位点数量可能很大以及不同基因之间可能存在一些非线性相互作用，建立基因型和表型之间的这种重要联系变得很复杂。压缩感知方法可求解欠定线性方程组。这些方法可应用于确定将基因型与表型联系起来的最佳模型的问题，并且通常比一次简单地将表型对每个遗传变异进行回归具有更好的性能。我们引入一种压缩感知方法，该方法可以从表型 - 基因型（全基因组关联研究，GWAS）数据中重建非线性遗传模型（即包括上位性或基因 - 基因相互作用）。我们的方法使用应用于感知矩阵非线性函数的L1惩罚回归。

结果

假设广义稀疏条件限制了基因 - 基因相互作用的总数，我们方法的计算和数据资源需求与重建线性遗传模型（或鉴定基因 - 性状关联）所需的需求相似。稀疏非线性模型的一个例子是，一个典型位点与其他几个甚至许多位点相互作用，但所有可能相互作用中只有一小部分存在。大多数遗传结构似乎都属于这一类别，这似乎是合理的。我们给出理论论据表明该方法在性能上几乎是最优的，并使用模拟人类基因组和少量当前可用的真实数据证明了其在广泛类别的非线性遗传模型上的有效性。算法行为中的相变（即剧烈和定性的变化）表明何时有足够的数据可用于其成功应用。

结论

我们的结果表明，许多复杂性状的预测模型，包括各种人类疾病易感性（例如，加性遗传力h(2) ∼ 0.5），可以从由n⋆ ∼ 100s个个体组成的数据集中提取，其中s是影响该性状的不同因果变异的数量。例如，对于由 ∼ 10 k个位点控制的性状，大约一百万个个体就足以应用该方法。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0f0c/4570224/2441fcff0034/13742_2015_81_Fig1_HTML.jpg

相似文献

Determination of nonlinear genetic architecture using compressed sensing.

Gigascience. 2015 Sep 14;4:44. doi: 10.1186/s13742-015-0081-6. eCollection 2015.

Combinatorial Methods for Epistasis and Dominance.

J Comput Biol. 2017 Apr;24(4):267-279. doi: 10.1089/cmb.2016.0112. Epub 2016 Nov 21.

Accounting for trait architecture in genomic predictions of US Holstein cattle using a weighted realized relationship matrix.

Genet Sel Evol. 2015 Apr 2;47(1):24. doi: 10.1186/s12711-015-0100-1.

On the Fourier transform of a quantitative trait: Implications for compressive sensing.

J Theor Biol. 2022 May 7;540:110985. doi: 10.1016/j.jtbi.2021.110985. Epub 2021 Dec 23.

Exploiting biological priors and sequence variants enhances QTL discovery and genomic prediction of complex traits.

BMC Genomics. 2016 Feb 27;17:144. doi: 10.1186/s12864-016-2443-6.

How powerful are summary-based methods for identifying expression-trait associations under different genetic architectures?

Pac Symp Biocomput. 2018;23:228-239.

Smaller, scale-free gene networks increase quantitative trait heritability and result in faster population recovery.

PLoS One. 2011 Feb 9;6(2):e14645. doi: 10.1371/journal.pone.0014645.

Imperfect Linkage Disequilibrium Generates Phantom Epistasis (& Perils of Big Data).

G3 (Bethesda). 2019 May 7;9(5):1429-1436. doi: 10.1534/g3.119.400101.

Integrate multiple traits to detect novel trait-gene association using GWAS summary data with an adaptive test approach.

Bioinformatics. 2019 Jul 1;35(13):2251-2257. doi: 10.1093/bioinformatics/bty961.

Compressed-sensing-based content-driven hierarchical reconstruction: Theory and application to C-arm cone-beam tomography.

Med Phys. 2015 Sep;42(9):5222-37. doi: 10.1118/1.4928144.

引用本文的文献

Detecting Non-linear Dependence through Genome Wide Analysis.

bioRxiv. 2025 Feb 13:2025.02.12.637804. doi: 10.1101/2025.02.12.637804.

From Genotype to Phenotype: Polygenic Prediction of Complex Human Traits.

Methods Mol Biol. 2022;2467:421-446. doi: 10.1007/978-1-0716-2205-6_15.

Sibling validation of polygenic risk scores and complex trait prediction.

Sci Rep. 2020 Aug 6;10(1):13190. doi: 10.1038/s41598-020-69927-7.

Genetic architecture of complex traits and disease risk predictors.

Sci Rep. 2020 Jul 21;10(1):12055. doi: 10.1038/s41598-020-68881-8.

Genomic Prediction of 16 Complex Disease Risks Including Heart Attack, Diabetes, Breast and Prostate Cancer.

Sci Rep. 2019 Oct 25;9(1):15286. doi: 10.1038/s41598-019-51258-x.

Accurate Genomic Prediction of Human Height.

Genetics. 2018 Oct;210(2):477-497. doi: 10.1534/genetics.118.301267. Epub 2018 Aug 27.

本文引用的文献

Applying compressed sensing to genome-wide association studies.

Gigascience. 2014 Jun 16;3:10. doi: 10.1186/2047-217X-3-10. eCollection 2014.

Leveraging input and output structures for joint mapping of epistatic and marginal eQTLs.

Bioinformatics. 2012 Jun 15;28(12):i137-46. doi: 10.1093/bioinformatics/bts227.

An empirical comparison of several recent epistatic interaction detection methods.

Bioinformatics. 2011 Nov 1;27(21):2936-43. doi: 10.1093/bioinformatics/btr512. Epub 2011 Sep 7.

Statistical analysis of genetic interactions.

Genet Res (Camb). 2010 Dec;92(5-6):443-59. doi: 10.1017/S0016672310000595.

GCTA: a tool for genome-wide complex trait analysis.

Am J Hum Genet. 2011 Jan 7;88(1):76-82. doi: 10.1016/j.ajhg.2010.11.011. Epub 2010 Dec 17.

BOOST: A fast approach to detecting gene-gene interactions in genome-wide case-control studies.

Am J Hum Genet. 2010 Sep 10;87(3):325-40. doi: 10.1016/j.ajhg.2010.07.021.

Regularization Paths for Generalized Linear Models via Coordinate Descent.

J Stat Softw. 2010;33(1):1-22.

Common SNPs explain a large proportion of the heritability for human height.

Nat Genet. 2010 Jul;42(7):565-9. doi: 10.1038/ng.608. Epub 2010 Jun 20.

TEAM: efficient two-locus epistasis tests in human genome-wide association study.

Bioinformatics. 2010 Jun 15;26(12):i217-27. doi: 10.1093/bioinformatics/btq186.

Screen and clean: a tool for identifying interactions in genome-wide association studies.

Genet Epidemiol. 2010 Apr;34(3):275-85. doi: 10.1002/gepi.20459.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

利用压缩感知确定非线性遗传结构

Determination of nonlinear genetic architecture using compressed sensing.

作者信息

Ho Chiu Man, Hsu Stephen D H

机构信息

Department of Physics and Astronomy, Michigan State University, 567 Wilson Road, East Lansing, 48824 MI USA.