• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

基于全基因组序列数据,使用拆分合并贝叶斯变量选择的高效基因组预测。

Efficient genomic prediction based on whole-genome sequence data using split-and-merge Bayesian variable selection.

作者信息

Calus Mario P L, Bouwman Aniek C, Schrooten Chris, Veerkamp Roel F

机构信息

Animal Breeding and Genomics Centre, Wageningen UR Livestock Research, PO Box 338, 6700 AH, Wageningen, The Netherlands.

CRV BV, 6800 AL, Arnhem, The Netherlands.

出版信息

Genet Sel Evol. 2016 Jun 29;48(1):49. doi: 10.1186/s12711-016-0225-x.

DOI:10.1186/s12711-016-0225-x
PMID:27357580
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC4926307/
Abstract

BACKGROUND

Use of whole-genome sequence data is expected to increase persistency of genomic prediction across generations and breeds but affects model performance and requires increased computing time. In this study, we investigated whether the split-and-merge Bayesian stochastic search variable selection (BSSVS) model could overcome these issues. BSSVS is performed first on subsets of sequence-based variants and then on a merged dataset containing variants selected in the first step.

RESULTS

We used a dataset that included 4,154,064 variants after editing and de-regressed proofs for 3415 reference and 2138 validation bulls for somatic cell score, protein yield and interval first to last insemination. In the first step, BSSVS was performed on 106 subsets each containing ~39,189 variants. In the second step, 1060 up to 472,492 variants, selected from the first step, were included to estimate the accuracy of genomic prediction. Accuracies were at best equal to those achieved with the commonly used Bovine 50k-SNP chip, although the number of variants within a few well-known quantitative trait loci regions was considerably enriched. When variant selection and the final genomic prediction were performed on the same data, predictions were biased. Predictions computed as the average of the predictions computed for each subset achieved the highest accuracies, i.e. 0.5 to 1.1 % higher than the accuracies obtained with the 50k-SNP chip, and yielded the least biased predictions. Finally, the accuracy of genomic predictions obtained when all sequence-based variants were included was similar or up to 1.4 % lower compared to that based on the average predictions across the subsets. By applying parallelization, the split-and-merge procedure was completed in 5 days, while the standard analysis including all sequence-based variants took more than three months.

CONCLUSIONS

The split-and-merge approach splits one large computational task into many much smaller ones, which allows the use of parallel processing and thus efficient genomic prediction based on whole-genome sequence data. The split-and-merge approach did not improve prediction accuracy, probably because we used data on a single breed for which relationships between individuals were high. Nevertheless, the split-and-merge approach may have potential for applications on data from multiple breeds.

摘要

背景

全基因组序列数据的使用有望提高基因组预测在各代和各品种间的持续性,但会影响模型性能且需要增加计算时间。在本研究中,我们调查了拆分合并贝叶斯随机搜索变量选择(BSSVS)模型是否能克服这些问题。BSSVS首先在基于序列的变异子集上进行,然后在包含第一步中选择的变异的合并数据集上进行。

结果

我们使用了一个数据集,该数据集在对3415头参考公牛和2138头验证公牛的体细胞评分、蛋白质产量以及首次输精到最后一次输精的间隔进行编辑和去回归证明后,包含4,154,064个变异。在第一步中,BSSVS在106个子集上进行,每个子集包含约39,189个变异。在第二步中,纳入从第一步中选择的1060至472,492个变异,以估计基因组预测的准确性。尽管一些知名数量性状基因座区域内的变异数量大幅增加,但准确性最高仅与使用常用的牛50k-SNP芯片时相当。当在相同数据上进行变异选择和最终基因组预测时,预测存在偏差。将每个子集计算的预测值求平均得到的预测结果准确性最高,即比使用50k-SNP芯片获得的准确性高0.5%至1.1%,且偏差最小。最后,与基于各子集平均预测结果相比,纳入所有基于序列的变异时获得的基因组预测准确性相似或低至1.4%。通过应用并行化,拆分合并过程在5天内完成,而包括所有基于序列变异的标准分析则耗时三个多月。

结论

拆分合并方法将一个大型计算任务拆分为许多小得多的任务,这使得能够使用并行处理,从而基于全基因组序列数据进行高效的基因组预测。拆分合并方法并未提高预测准确性,可能是因为我们使用的是单个品种的数据,个体间亲缘关系较高。尽管如此,拆分合并方法可能在多品种数据的应用中具有潜力。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/55f0/4926307/1beff9f6c755/12711_2016_225_Fig12_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/55f0/4926307/2f6aec205913/12711_2016_225_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/55f0/4926307/a6abc0a2cc35/12711_2016_225_Fig2_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/55f0/4926307/cd69d03cff6c/12711_2016_225_Fig3_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/55f0/4926307/2000712e2f67/12711_2016_225_Fig4_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/55f0/4926307/e6a7f2489842/12711_2016_225_Fig5_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/55f0/4926307/b6860b1a1c43/12711_2016_225_Fig6_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/55f0/4926307/a86d9d182a58/12711_2016_225_Fig7_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/55f0/4926307/2fff3ce54ec1/12711_2016_225_Fig8_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/55f0/4926307/72b2378a36a7/12711_2016_225_Fig9_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/55f0/4926307/4bcff9dfe098/12711_2016_225_Fig10_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/55f0/4926307/9f8ef5746c1a/12711_2016_225_Fig11_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/55f0/4926307/1beff9f6c755/12711_2016_225_Fig12_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/55f0/4926307/2f6aec205913/12711_2016_225_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/55f0/4926307/a6abc0a2cc35/12711_2016_225_Fig2_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/55f0/4926307/cd69d03cff6c/12711_2016_225_Fig3_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/55f0/4926307/2000712e2f67/12711_2016_225_Fig4_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/55f0/4926307/e6a7f2489842/12711_2016_225_Fig5_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/55f0/4926307/b6860b1a1c43/12711_2016_225_Fig6_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/55f0/4926307/a86d9d182a58/12711_2016_225_Fig7_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/55f0/4926307/2fff3ce54ec1/12711_2016_225_Fig8_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/55f0/4926307/72b2378a36a7/12711_2016_225_Fig9_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/55f0/4926307/4bcff9dfe098/12711_2016_225_Fig10_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/55f0/4926307/9f8ef5746c1a/12711_2016_225_Fig11_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/55f0/4926307/1beff9f6c755/12711_2016_225_Fig12_HTML.jpg

相似文献

1
Efficient genomic prediction based on whole-genome sequence data using split-and-merge Bayesian variable selection.基于全基因组序列数据,使用拆分合并贝叶斯变量选择的高效基因组预测。
Genet Sel Evol. 2016 Jun 29;48(1):49. doi: 10.1186/s12711-016-0225-x.
2
Multibreed genomic prediction using multitrait genomic residual maximum likelihood and multitask Bayesian variable selection.多品种基因组预测使用多性状基因组残差极大似然法和多任务贝叶斯变量选择。
J Dairy Sci. 2018 May;101(5):4279-4294. doi: 10.3168/jds.2017-13366. Epub 2018 Mar 15.
3
Genomic prediction using imputed whole-genome sequence data in Holstein Friesian cattle.利用推算的全基因组序列数据对荷斯坦奶牛进行基因组预测。
Genet Sel Evol. 2015 Sep 17;47(1):71. doi: 10.1186/s12711-015-0149-x.
4
Genomic prediction of breeding values using previously estimated SNP variances.利用先前估计的单核苷酸多态性(SNP)方差进行育种值的基因组预测。
Genet Sel Evol. 2014 Sep 25;46(1):52. doi: 10.1186/s12711-014-0052-x.
5
Multi-breed genomic prediction using Bayes R with sequence data and dropping variants with a small effect.使用贝叶斯R方法结合序列数据并剔除效应较小的变异进行多品种基因组预测。
Genet Sel Evol. 2017 Sep 21;49(1):70. doi: 10.1186/s12711-017-0347-9.
6
Genomic prediction using preselected DNA variants from a GWAS with whole-genome sequence data in Holstein-Friesian cattle.在荷斯坦-弗里生奶牛中,利用全基因组序列数据,从全基因组关联研究(GWAS)中预先选择的DNA变异进行基因组预测。
Genet Sel Evol. 2016 Dec 1;48(1):95. doi: 10.1186/s12711-016-0274-1.
7
Quantitative trait loci markers derived from whole genome sequence data increases the reliability of genomic prediction.源自全基因组序列数据的数量性状位点标记提高了基因组预测的可靠性。
J Dairy Sci. 2015 Jun;98(6):4107-16. doi: 10.3168/jds.2014-9005. Epub 2015 Apr 16.
8
Design of a low-density SNP chip for the main Australian sheep breeds and its effect on imputation and genomic prediction accuracy.用于澳大利亚主要绵羊品种的低密度单核苷酸多态性(SNP)芯片设计及其对填充和基因组预测准确性的影响。
Anim Genet. 2015 Oct;46(5):544-56. doi: 10.1111/age.12340. Epub 2015 Sep 11.
9
Improving accuracy of genomic predictions within and between dairy cattle breeds with imputed high-density single nucleotide polymorphism panels.利用高分辨率单核苷酸多态性面板提高奶牛品种内和品种间基因组预测的准确性。
J Dairy Sci. 2012 Jul;95(7):4114-29. doi: 10.3168/jds.2011-5019.
10
Accuracy of predicting genomic breeding values for residual feed intake in Angus and Charolais beef cattle.预测 Angus 和夏洛莱肉牛剩余采食量的基因组育种值的准确性。
J Anim Sci. 2013 Oct;91(10):4669-78. doi: 10.2527/jas.2013-5715.

引用本文的文献

1
Incorporating information of causal variants in genomic prediction using GBLUP or machine learning models in a simulated livestock population.在一个模拟的家畜群体中,使用GBLUP或机器学习模型将因果变异信息纳入基因组预测。
J Anim Sci Biotechnol. 2025 Aug 19;16(1):118. doi: 10.1186/s40104-025-01250-5.
2
EXGEP: a framework for predicting genotype-by-environment interactions using ensembles of explainable machine-learning models.EXGEP:一个使用可解释机器学习模型集成来预测基因-环境相互作用的框架。
Brief Bioinform. 2025 Jul 2;26(4). doi: 10.1093/bib/bbaf414.
3
Genetic evaluation of productive longevity in a multibreed beef cattle population.

本文引用的文献

1
Accuracy of genomic prediction using imputed whole-genome sequence data in white layers.使用推算的全基因组序列数据对白来航蛋鸡进行基因组预测的准确性
J Anim Breed Genet. 2016 Jun;133(3):167-79. doi: 10.1111/jbg.12199. Epub 2016 Jan 18.
2
Across population genomic prediction scenarios in which Bayesian variable selection outperforms GBLUP.在贝叶斯变量选择优于基因组最佳线性无偏预测(GBLUP)的群体基因组预测场景中。
BMC Genet. 2015 Dec 23;16:146. doi: 10.1186/s12863-015-0305-x.
3
Genomic prediction using imputed whole-genome sequence data in Holstein Friesian cattle.
多品种肉牛群体生产寿命的遗传评估
J Anim Sci. 2024 Jan 3;102. doi: 10.1093/jas/skae363.
4
GWABLUP: genome-wide association assisted best linear unbiased prediction of genetic values.GWABLUP:基于全基因组关联的最佳线性无偏遗传预测。
Genet Sel Evol. 2024 Mar 1;56(1):17. doi: 10.1186/s12711-024-00881-y.
5
Genomic prediction based on selective linkage disequilibrium pruning of low-coverage whole-genome sequence variants in a pure Duroc population.基于在纯杜洛克群体中对低覆盖度全基因组序列变异体进行选择性连锁不平衡修剪的基因组预测。
Genet Sel Evol. 2023 Oct 18;55(1):72. doi: 10.1186/s12711-023-00843-w.
6
Utilizing Variants Identified with Multiple Genome-Wide Association Study Methods Optimizes Genomic Selection for Growth Traits in Pigs.利用多种全基因组关联研究方法鉴定出的变异优化猪生长性状的基因组选择
Animals (Basel). 2023 Feb 17;13(4):722. doi: 10.3390/ani13040722.
7
Genomic prediction with whole-genome sequence data in intensely selected pig lines.全基因组序列数据在高度选育猪系中的基因组预测。
Genet Sel Evol. 2022 Sep 24;54(1):65. doi: 10.1186/s12711-022-00756-0.
8
Evaluation of Whole-Genome Sequence Imputation Strategies in Korean Hanwoo Cattle.韩牛全基因组序列填充策略的评估
Animals (Basel). 2022 Sep 1;12(17):2265. doi: 10.3390/ani12172265.
9
Association Studies and Genomic Prediction for Genetic Improvements in Agriculture.农业遗传改良的关联研究与基因组预测
Front Plant Sci. 2022 Jun 2;13:904230. doi: 10.3389/fpls.2022.904230. eCollection 2022.
10
Increased accuracy of genomic predictions for growth under chronic thermal stress in rainbow trout by prioritizing variants from GWAS using imputed sequence data.通过使用推算序列数据对全基因组关联研究(GWAS)中的变异进行优先级排序,提高虹鳟在慢性热应激下生长的基因组预测准确性。
Evol Appl. 2021 May 18;15(4):537-552. doi: 10.1111/eva.13240. eCollection 2022 Apr.
利用推算的全基因组序列数据对荷斯坦奶牛进行基因组预测。
Genet Sel Evol. 2015 Sep 17;47(1):71. doi: 10.1186/s12711-015-0149-x.
4
Selection of haplotype variables from a high-density marker map for genomic prediction.从高密度标记图谱中选择单倍型变量用于基因组预测。
Genet Sel Evol. 2015 Aug 1;47(1):61. doi: 10.1186/s12711-015-0143-3.
5
Sequence- vs. chip-assisted genomic selection: accurate biological information is advised.序列辅助基因组选择与芯片辅助基因组选择:建议提供准确的生物学信息。
Genet Sel Evol. 2015 May 9;47(1):43. doi: 10.1186/s12711-015-0117-5.
6
Quantitative trait loci markers derived from whole genome sequence data increases the reliability of genomic prediction.源自全基因组序列数据的数量性状位点标记提高了基因组预测的可靠性。
J Dairy Sci. 2015 Jun;98(6):4107-16. doi: 10.3168/jds.2014-9005. Epub 2015 Apr 16.
7
minimac2: faster genotype imputation.Minimac2:更快的基因型填充。
Bioinformatics. 2015 Mar 1;31(5):782-4. doi: 10.1093/bioinformatics/btu704. Epub 2014 Oct 22.
8
Consequences of splitting whole-genome sequencing effort over multiple breeds on imputation accuracy.将全基因组测序工作分散到多个品种上对基因填充准确性的影响。
BMC Genet. 2014 Oct 3;15:105. doi: 10.1186/s12863-014-0105-8.
9
The effects of demography and long-term selection on the accuracy of genomic prediction with sequence data.人口统计学和长期选择对基于序列数据的基因组预测准确性的影响。
Genetics. 2014 Dec;198(4):1671-84. doi: 10.1534/genetics.114.168344. Epub 2014 Sep 18.
10
Strategies for imputation to whole genome sequence using a single or multi-breed reference population in cattle.利用单一或多品种参考群体对牛全基因组序列进行填充的策略。
BMC Genomics. 2014 Aug 27;15(1):728. doi: 10.1186/1471-2164-15-728.