Department of Computational Biology, University of Lausanne, Lausanne, Switzerland.
Scientific Computing and Research Support Unit, University of Lausanne, Lausanne, Switzerland.
Nat Commun. 2021 Nov 30;12(1):6972. doi: 10.1038/s41467-021-27258-9.
We develop a Bayesian model (BayesRR-RC) that provides robust SNP-heritability estimation, an alternative to marker discovery, and accurate genomic prediction, taking 22 seconds per iteration to estimate 8.4 million SNP-effects and 78 SNP-heritability parameters in the UK Biobank. We find that only ≤10% of the genetic variation captured for height, body mass index, cardiovascular disease, and type 2 diabetes is attributable to proximal regulatory regions within 10kb upstream of genes, while 12-25% is attributed to coding regions, 32-44% to introns, and 22-28% to distal 10-500kb upstream regions. Up to 24% of all cis and coding regions of each chromosome are associated with each trait, with over 3,100 independent exonic and intronic regions and over 5,400 independent regulatory regions having ≥95% probability of contributing ≥0.001% to the genetic variance of these four traits. Our open-source software (GMRM) provides a scalable alternative to current approaches for biobank data.
我们开发了一种贝叶斯模型(BayesRR-RC),该模型提供了稳健的 SNP 遗传力估计、替代标记发现以及准确的基因组预测,每次迭代只需 22 秒即可估计 UK Biobank 中的 840 万个 SNP 效应和 78 个 SNP 遗传力参数。我们发现,对于身高、体重指数、心血管疾病和 2 型糖尿病,只有 ≤10%的遗传变异可以归因于基因上游 10kb 内的近端调控区域,而 12-25%归因于编码区域,32-44%归因于内含子,22-28%归因于远端 10-500kb 上游区域。每个染色体的所有顺式和编码区域中,高达 24%与每个特征相关联,超过 3100 个独立的外显子和内含子区域以及超过 5400 个独立的调控区域,有 ≥95%的概率对这四个特征的遗传方差贡献 ≥0.001%。我们的开源软件(GMRM)为生物库数据提供了一种可扩展的替代当前方法。