一种基于贝叶斯混合模型的基因组预测和QTL定位的混合期望最大化与马尔可夫链蒙特卡罗采样算法。

A hybrid expectation maximisation and MCMC sampling algorithm to implement Bayesian mixture model based genomic prediction and QTL mapping.

作者信息

Wang Tingting, Chen Yi-Ping Phoebe, Bowman Phil J, Goddard Michael E, Hayes Ben J

机构信息

School of Engineering and Mathematical Sciences, La Trobe University, Melbourne, VIC, Australia.

Biosciences Research, Department of Economic Development, Jobs, Transport and Resources, Bundoora, Melbourne, VIC, Australia.

出版信息

BMC Genomics. 2016 Sep 21;17(1):744. doi: 10.1186/s12864-016-3082-7.

DOI:10.1186/s12864-016-3082-7

PMID:27654580

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC5031345/

Abstract

BACKGROUND

Bayesian mixture models in which the effects of SNP are assumed to come from normal distributions with different variances are attractive for simultaneous genomic prediction and QTL mapping. These models are usually implemented with Monte Carlo Markov Chain (MCMC) sampling, which requires long compute times with large genomic data sets. Here, we present an efficient approach (termed HyB_BR), which is a hybrid of an Expectation-Maximisation algorithm, followed by a limited number of MCMC without the requirement for burn-in.

RESULTS

To test prediction accuracy from HyB_BR, dairy cattle and human disease trait data were used. In the dairy cattle data, there were four quantitative traits (milk volume, protein kg, fat% in milk and fertility) measured in 16,214 cattle from two breeds genotyped for 632,002 SNPs. Validation of genomic predictions was in a subset of cattle either from the reference set or in animals from a third breeds that were not in the reference set. In all cases, HyB_BR gave almost identical accuracies to Bayesian mixture models implemented with full MCMC, however computational time was reduced by up to 1/17 of that required by full MCMC. The SNPs with high posterior probability of a non-zero effect were also very similar between full MCMC and HyB_BR, with several known genes affecting milk production in this category, as well as some novel genes. HyB_BR was also applied to seven human diseases with 4890 individuals genotyped for around 300 K SNPs in a case/control design, from the Welcome Trust Case Control Consortium (WTCCC). In this data set, the results demonstrated again that HyB_BR performed as well as Bayesian mixture models with full MCMC for genomic predictions and genetic architecture inference while reducing the computational time from 45 h with full MCMC to 3 h with HyB_BR.

CONCLUSIONS

The results for quantitative traits in cattle and disease in humans demonstrate that HyB_BR can perform equally well as Bayesian mixture models implemented with full MCMC in terms of prediction accuracy, but with up to 17 times faster than the full MCMC implementations. The HyB_BR algorithm makes simultaneous genomic prediction, QTL mapping and inference of genetic architecture feasible in large genomic data sets.

摘要

背景

贝叶斯混合模型假定单核苷酸多态性（SNP）的效应来自具有不同方差的正态分布，对于同时进行基因组预测和数量性状基因座（QTL）定位很有吸引力。这些模型通常通过蒙特卡洛马尔可夫链（MCMC）采样来实现，对于大型基因组数据集而言，这需要很长的计算时间。在此，我们提出一种高效方法（称为HyB_BR），它是期望最大化算法的一种混合方法，随后进行有限次数的MCMC，且无需进行预烧。

结果

为了测试HyB_BR的预测准确性，使用了奶牛和人类疾病性状数据。在奶牛数据中，对来自两个品种的16214头奶牛测量了四个数量性状（产奶量、蛋白质千克数、乳脂率和繁殖力），这些奶牛针对632002个SNP进行了基因分型。基因组预测的验证在来自参考集的一部分奶牛中进行，或者在未包含在参考集中的第三个品种的动物中进行。在所有情况下，HyB_BR给出的准确性与使用完整MCMC实现的贝叶斯混合模型几乎相同，然而计算时间减少到完整MCMC所需时间的1/17。在完整MCMC和HyB_BR之间，具有非零效应的高后验概率的SNP也非常相似，在这一类别中有几个影响产奶的已知基因以及一些新基因。HyB_BR还应用于来自威康信托病例对照协会（WTCCC）的病例/对照设计中的七种人类疾病，该设计对4890名个体针对约30万个SNP进行了基因分型。在这个数据集中，结果再次表明，在基因组预测和遗传结构推断方面，HyB_BR的表现与使用完整MCMC的贝叶斯混合模型一样好，同时将计算时间从完整MCMC的45小时减少到HyB_BR的3小时。

结论

奶牛数量性状和人类疾病的结果表明，在预测准确性方面，HyB_BR的表现与使用完整MCMC实现的贝叶斯混合模型一样好，但速度比完整MCMC实现快17倍。HyB_BR算法使得在大型基因组数据集中同时进行基因组预测、QTL定位和遗传结构推断成为可能。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6fdc/5031345/23308f9432fa/12864_2016_3082_Fig1_HTML.jpg

相似文献

A hybrid expectation maximisation and MCMC sampling algorithm to implement Bayesian mixture model based genomic prediction and QTL mapping.一种基于贝叶斯混合模型的基因组预测和QTL定位的混合期望最大化与马尔可夫链蒙特卡罗采样算法。

BMC Genomics. 2016 Sep 21;17(1):744. doi: 10.1186/s12864-016-3082-7.

Application of a Bayesian non-linear model hybrid scheme to sequence data for genomic prediction and QTL mapping.贝叶斯非线性模型混合方案在基因组预测和QTL定位序列数据中的应用。

BMC Genomics. 2017 Aug 15;18(1):618. doi: 10.1186/s12864-017-4030-x.

A computationally efficient algorithm for genomic prediction using a Bayesian model.一种使用贝叶斯模型进行基因组预测的计算高效算法。

Genet Sel Evol. 2015 Apr 30;47(1):34. doi: 10.1186/s12711-014-0082-4.

Fast genomic prediction of breeding values using parallel Markov chain Monte Carlo with convergence diagnosis.利用具有收敛诊断的并行马尔可夫链蒙特卡罗方法快速预测育种值。

BMC Bioinformatics. 2018 Jan 3;19(1):3. doi: 10.1186/s12859-017-2003-3.

A multi-trait Bayesian method for mapping QTL and genomic prediction.一种用于 QTL 作图和基因组预测的多性状贝叶斯方法。

Genet Sel Evol. 2018 Mar 24;50(1):10. doi: 10.1186/s12711-018-0377-y.

On the use of whole-genome sequence data for across-breed genomic prediction and fine-scale mapping of QTL.关于全基因组序列数据在跨品种基因组预测和QTL精细定位中的应用。

Genet Sel Evol. 2021 Feb 26;53(1):19. doi: 10.1186/s12711-021-00607-4.

Study on mapping quantitative trait loci for animal complex binary traits using Bayesian-Markov chain Monte Carlo approach.利用贝叶斯-马尔可夫链蒙特卡罗方法对动物复杂二元性状进行数量性状基因座定位的研究。

Sci China C Life Sci. 2006 Dec;49(6):552-9. doi: 10.1007/s11427-006-2024-z.

Improving accuracy of genomic predictions within and between dairy cattle breeds with imputed high-density single nucleotide polymorphism panels.利用高分辨率单核苷酸多态性面板提高奶牛品种内和品种间基因组预测的准确性。

J Dairy Sci. 2012 Jul;95(7):4114-29. doi: 10.3168/jds.2011-5019.

Comparison of Bayesian models to estimate direct genomic values in multi-breed commercial beef cattle.多品种商业肉牛中贝叶斯模型估计直接基因组值的比较。

Genet Sel Evol. 2015 Apr 1;47(1):23. doi: 10.1186/s12711-015-0106-8.

Accuracy of prediction of simulated polygenic phenotypes and their underlying quantitative trait loci genotypes using real or imputed whole-genome markers in cattle.利用真实或推算的全基因组标记预测牛模拟多基因表型及其潜在数量性状位点基因型的准确性。

Genet Sel Evol. 2015 Dec 23;47:99. doi: 10.1186/s12711-015-0179-4.

引用本文的文献

Improving Genomic Predictions in Multi-Breed Cattle Populations: A Comparative Analysis of BayesR and GBLUP Models.提高多品种牛群体的基因组预测：贝叶斯 R 和 GBLUP 模型的比较分析。

Genes (Basel). 2024 Feb 18;15(2):253. doi: 10.3390/genes15020253.

Using encrypted genotypes and phenotypes for collaborative genomic analyses to maintain data confidentiality.利用加密基因型和表型进行合作基因组分析，以维护数据机密性。

Genetics. 2024 Mar 6;226(3). doi: 10.1093/genetics/iyad210.

BayesR3 enables fast MCMC blocked processing for largescale multi-trait genomic prediction and QTN mapping analysis.贝叶斯 R3 能够实现大规模多性状基因组预测和 QTN 映射分析的快速 MCMC 块处理。

Commun Biol. 2022 Jul 5;5(1):661. doi: 10.1038/s42003-022-03624-1.

Application of Bayesian genomic prediction methods to genome-wide association analyses.贝叶斯基因组预测方法在全基因组关联分析中的应用。

Genet Sel Evol. 2022 May 13;54(1):31. doi: 10.1186/s12711-022-00724-8.

Genome-wide Association Study for Carcass Primal Cut Yields Using Single-step Bayesian Approach in Hanwoo Cattle.利用单步贝叶斯方法对韩牛胴体主要切块产量进行全基因组关联研究。

Front Genet. 2021 Nov 26;12:752424. doi: 10.3389/fgene.2021.752424. eCollection 2021.

Interpretable artificial neural networks incorporating Bayesian alphabet models for genome-wide prediction and association studies.基于贝叶斯字母模型的可解释人工神经网络在全基因组预测和关联研究中的应用。

G3 (Bethesda). 2021 Sep 27;11(10). doi: 10.1093/g3journal/jkab228.

Overlap between eQTL and QTL associated with production traits and fertility in dairy cattle.奶牛生产性状和繁殖力相关的 eQTL 和 QTL 之间的重叠。

BMC Genomics. 2019 Apr 15;20(1):291. doi: 10.1186/s12864-019-5656-7.

GWAS by GBLUP: Single and Multimarker EMMAX and Bayes Factors, with an Example in Detection of a Major Gene for Horse Gait.基于基因组最佳线性无偏预测的全基因组关联研究：单标记和多标记EMMAX及贝叶斯因子，以马步态主要基因检测为例

G3 (Bethesda). 2018 Jul 2;8(7):2301-2308. doi: 10.1534/g3.118.200336.

Multi-breed genomic prediction using Bayes R with sequence data and dropping variants with a small effect.使用贝叶斯R方法结合序列数据并剔除效应较小的变异进行多品种基因组预测。

Genet Sel Evol. 2017 Sep 21;49(1):70. doi: 10.1186/s12711-017-0347-9.

BMC Genomics. 2017 Aug 15;18(1):618. doi: 10.1186/s12864-017-4030-x.

本文引用的文献

Targeted imputation of sequence variants and gene expression profiling identifies twelve candidate genes associated with lactation volume, composition and calving interval in dairy cattle.序列变异的靶向填充和基因表达谱分析鉴定出12个与奶牛产奶量、奶成分及产犊间隔相关的候选基因。

Mamm Genome. 2016 Feb;27(1-2):81-97. doi: 10.1007/s00335-015-9613-8. Epub 2015 Nov 27.

Extensive variation between tissues in allele specific expression in an outbred mammal.远交哺乳动物中各组织间等位基因特异性表达存在广泛差异。

BMC Genomics. 2015 Nov 23;16:993. doi: 10.1186/s12864-015-2174-0.

Response and inbreeding from a genomic selection experiment in layer chickens.蛋鸡基因组选择实验中的反应与近亲繁殖

Genet Sel Evol. 2015 Jul 7;47(1):59. doi: 10.1186/s12711-015-0133-5.

A computationally efficient algorithm for genomic prediction using a Bayesian model.一种使用贝叶斯模型进行基因组预测的计算高效算法。

Genet Sel Evol. 2015 Apr 30;47(1):34. doi: 10.1186/s12711-014-0082-4.

Improved precision of QTL mapping using a nonlinear Bayesian method in a multi-breed population leads to greater accuracy of across-breed genomic predictions.在多品种群体中使用非线性贝叶斯方法提高QTL定位的精度，可提高跨品种基因组预测的准确性。

Genet Sel Evol. 2015 Apr 17;47(1):29. doi: 10.1186/s12711-014-0074-4.

Simultaneous discovery, estimation and prediction analysis of complex traits using a bayesian mixture model.使用贝叶斯混合模型对复杂性状进行同时发现、估计和预测分析。

PLoS Genet. 2015 Apr 7;11(4):e1004969. doi: 10.1371/journal.pgen.1004969. eCollection 2015 Apr.

Including overseas performance information in genomic evaluations of Australian dairy cattle.将海外生产性能信息纳入澳大利亚奶牛的基因组评估中。

J Dairy Sci. 2015 May;98(5):3443-59. doi: 10.3168/jds.2014-8785. Epub 2015 Mar 12.

Efficient Bayesian mixed-model analysis increases association power in large cohorts.高效的贝叶斯混合模型分析提高了大型队列研究中的关联效能。

Nat Genet. 2015 Mar;47(3):284-90. doi: 10.1038/ng.3190. Epub 2015 Feb 2.

Accelerating improvement of livestock with genomic selection.利用基因组选择加速家畜改良。

Annu Rev Anim Biosci. 2013 Jan;1:221-37. doi: 10.1146/annurev-animal-031412-103705. Epub 2013 Jan 1.

Defining the role of common variation in the genomic and biological architecture of adult human height.确定常见变异在成年人类身高的基因组和生物学结构中的作用。

Nat Genet. 2014 Nov;46(11):1173-86. doi: 10.1038/ng.3097. Epub 2014 Oct 5.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

一种基于贝叶斯混合模型的基因组预测和QTL定位的混合期望最大化与马尔可夫链蒙特卡罗采样算法。

A hybrid expectation maximisation and MCMC sampling algorithm to implement Bayesian mixture model based genomic prediction and QTL mapping.

作者信息

机构信息

出版信息

BACKGROUND

RESULTS

CONCLUSIONS

背景

结果

结论

相似文献

引用本文的文献

本文引用的文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献