Suppr超能文献

使用贝叶斯加法回归树进行全基因组预测。

Genome-wide prediction using Bayesian additive regression trees.

作者信息

Waldmann Patrik

机构信息

Department of Animal Breeding and Genetics, Swedish University of Agricultural Sciences (SLU), Box 7023, 750 07, Uppsala, Sweden.

出版信息

Genet Sel Evol. 2016 Jun 10;48(1):42. doi: 10.1186/s12711-016-0219-8.

Abstract

BACKGROUND

The goal of genome-wide prediction (GWP) is to predict phenotypes based on marker genotypes, often obtained through single nucleotide polymorphism (SNP) chips. The major problem with GWP is high-dimensional data from many thousands of SNPs scored on several thousands of individuals. A large number of methods have been developed for GWP, which are mostly parametric methods that assume statistical linearity and only additive genetic effects. The Bayesian additive regression trees (BART) method was recently proposed and is based on the sum of nonparametric regression trees with the priors being used to regularize the parameters. Each regression tree is based on a recursive binary partitioning of the predictor space that approximates an unknown function, which will automatically model nonlinearities within SNPs (dominance) and interactions between SNPs (epistasis). In this study, we introduced BART and compared its predictive performance with that of the LASSO, Bayesian LASSO (BLASSO), genomic best linear unbiased prediction (GBLUP), reproducing kernel Hilbert space (RKHS) regression and random forest (RF) methods.

RESULTS

Tests on the QTLMAS2010 simulated data, which are mainly based on additive genetic effects, show that cross-validated optimization of BART provides a smaller prediction error than the RF, BLASSO, GBLUP and RKHS methods, and is almost as accurate as the LASSO method. If dominance and epistasis effects are added to the QTLMAS2010 data, the accuracy of BART relative to the other methods was increased. We also showed that BART can produce importance measures on the SNPs through variable inclusion proportions. In evaluations using real data on pigs, the prediction error was smaller with BART than with the other methods.

CONCLUSIONS

BART was shown to be an accurate method for GWP, in which the regression trees guarantee a very sparse representation of additive and complex non-additive genetic effects. Moreover, the Markov chain Monte Carlo algorithm with Bayesian back-fitting provides a computationally efficient procedure that is suitable for high-dimensional genomic data.

摘要

背景

全基因组预测(GWP)的目标是基于标记基因型预测表型,这些标记基因型通常通过单核苷酸多态性(SNP)芯片获得。GWP的主要问题是来自数千个个体上数千个SNP的高维数据。已经开发了大量用于GWP的方法,其中大多数是参数方法,这些方法假定统计线性且仅考虑加性遗传效应。贝叶斯加法回归树(BART)方法是最近提出的,它基于非参数回归树的总和,先验用于正则化参数。每个回归树基于预测变量空间的递归二元划分,该划分近似一个未知函数,这将自动对SNP内的非线性(显性)和SNP之间的相互作用(上位性)进行建模。在本研究中,我们引入了BART,并将其预测性能与套索(LASSO)、贝叶斯套索(BLASSO)、基因组最佳线性无偏预测(GBLUP)、再生核希尔伯特空间(RKHS)回归和随机森林(RF)方法进行了比较。

结果

对主要基于加性遗传效应的QTLMAS2010模拟数据的测试表明,BART的交叉验证优化提供了比RF、BLASSO、GBLUP和RKHS方法更小的预测误差,并且几乎与LASSO方法一样准确。如果将显性和上位性效应添加到QTLMAS2010数据中,BART相对于其他方法的准确性会提高。我们还表明,BART可以通过变量包含比例对SNP产生重要性度量。在使用猪的真实数据进行的评估中,BART的预测误差比其他方法小。

结论

BART被证明是一种用于GWP的准确方法,其中回归树保证了加性和复杂非加性遗传效应的非常稀疏的表示。此外,具有贝叶斯反向拟合的马尔可夫链蒙特卡罗算法提供了一种计算效率高的程序,适用于高维基因组数据。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6950/4901500/b767c1546dca/12711_2016_219_Fig1_HTML.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验