• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

随机森林、提升算法和支持向量机在基因组选择中的比较

A comparison of random forests, boosting and support vector machines for genomic selection.

作者信息

Ogutu Joseph O, Piepho Hans-Peter, Schulz-Streeck Torben

机构信息

Bioinformatics Unit, Institute of Crop Science, University of Hohenheim, Fruwirthstrasse 23, 70599 Stuttgart, Germany.

出版信息

BMC Proc. 2011 May 27;5 Suppl 3(Suppl 3):S11. doi: 10.1186/1753-6561-5-S3-S11.

DOI:10.1186/1753-6561-5-S3-S11
PMID:21624167
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC3103196/
Abstract

BACKGROUND

Genomic selection (GS) involves estimating breeding values using molecular markers spanning the entire genome. Accurate prediction of genomic breeding values (GEBVs) presents a central challenge to contemporary plant and animal breeders. The existence of a wide array of marker-based approaches for predicting breeding values makes it essential to evaluate and compare their relative predictive performances to identify approaches able to accurately predict breeding values. We evaluated the predictive accuracy of random forests (RF), stochastic gradient boosting (boosting) and support vector machines (SVMs) for predicting genomic breeding values using dense SNP markers and explored the utility of RF for ranking the predictive importance of markers for pre-screening markers or discovering chromosomal locations of QTLs.

METHODS

We predicted GEBVs for one quantitative trait in a dataset simulated for the QTLMAS 2010 workshop. Predictive accuracy was measured as the Pearson correlation between GEBVs and observed values using 5-fold cross-validation and between predicted and true breeding values. The importance of each marker was ranked using RF and plotted against the position of the marker and associated QTLs on one of five simulated chromosomes.

RESULTS

The correlations between the predicted and true breeding values were 0.547 for boosting, 0.497 for SVMs, and 0.483 for RF, indicating better performance for boosting than for SVMs and RF.

CONCLUSIONS

Accuracy was highest for boosting, intermediate for SVMs and lowest for RF but differed little among the three methods and relative to ridge regression BLUP (RR-BLUP).

摘要

背景

基因组选择(GS)涉及使用覆盖整个基因组的分子标记来估计育种值。准确预测基因组育种值(GEBV)是当代动植物育种者面临的核心挑战。存在大量基于标记的预测育种值的方法,因此评估和比较它们的相对预测性能以确定能够准确预测育种值的方法至关重要。我们使用密集的单核苷酸多态性(SNP)标记评估了随机森林(RF)、随机梯度提升(boosting)和支持向量机(SVM)预测基因组育种值的预测准确性,并探讨了RF在对标记的预测重要性进行排名以用于预筛选标记或发现数量性状基因座(QTL)的染色体位置方面的效用。

方法

我们在为QTLMAS 2010研讨会模拟的数据集中预测了一个数量性状的GEBV。使用5折交叉验证,以GEBV与观测值之间以及预测育种值与真实育种值之间的皮尔逊相关系数来衡量预测准确性。使用RF对每个标记的重要性进行排名,并针对五个模拟染色体之一上的标记位置和相关QTL进行绘图。

结果

boosting的预测育种值与真实育种值之间的相关系数为0.547,SVM为0.4并97,RF为0.483,表明boosting的性能优于SVM和RF。

结论

boosting的准确性最高,SVM居中,RF最低,但这三种方法之间以及与岭回归最佳线性无偏预测(RR-BLUP)相比差异不大。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a979/3103196/8a10f6bc1cb8/1753-6561-5-S3-S11-2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a979/3103196/c4e6c0bd4bcf/1753-6561-5-S3-S11-1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a979/3103196/8a10f6bc1cb8/1753-6561-5-S3-S11-2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a979/3103196/c4e6c0bd4bcf/1753-6561-5-S3-S11-1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a979/3103196/8a10f6bc1cb8/1753-6561-5-S3-S11-2.jpg

相似文献

1
A comparison of random forests, boosting and support vector machines for genomic selection.随机森林、提升算法和支持向量机在基因组选择中的比较
BMC Proc. 2011 May 27;5 Suppl 3(Suppl 3):S11. doi: 10.1186/1753-6561-5-S3-S11.
2
Genomic selection using regularized linear regression models: ridge regression, lasso, elastic net and their extensions.使用正则化线性回归模型的基因组选择:岭回归、套索回归、弹性网络及其扩展。
BMC Proc. 2012 May 21;6 Suppl 2(Suppl 2):S10. doi: 10.1186/1753-6561-6-S2-S10.
3
A comparison of five methods to predict genomic breeding values of dairy bulls from genome-wide SNP markers.比较五种方法从全基因组 SNP 标记预测奶牛公牛的基因组育种值。
Genet Sel Evol. 2009 Dec 31;41(1):56. doi: 10.1186/1297-9686-41-56.
4
Genomic Prediction of Breeding Values Using a Subset of SNPs Identified by Three Machine Learning Methods.使用三种机器学习方法鉴定的单核苷酸多态性(SNP)子集对育种值进行基因组预测。
Front Genet. 2018 Jul 4;9:237. doi: 10.3389/fgene.2018.00237. eCollection 2018.
5
Pre-selection of markers for genomic selection.基因组选择标记的预选择。
BMC Proc. 2011 May 27;5 Suppl 3(Suppl 3):S12. doi: 10.1186/1753-6561-5-S3-S12.
6
Comparison of five methods for genomic breeding value estimation for the common dataset of the 15th QTL-MAS Workshop.第15届QTL-MAS研讨会通用数据集的五种基因组育种值估计方法比较
BMC Proc. 2012 May 21;6 Suppl 2(Suppl 2):S13. doi: 10.1186/1753-6561-6-S2-S13.
7
Accuracy of genomic selection for a sib-evaluated trait using identity-by-state and identity-by-descent relationships.利用状态一致性和系谱一致性关系对同胞评估性状进行基因组选择的准确性。
Genet Sel Evol. 2015 Feb 25;47(1):9. doi: 10.1186/s12711-014-0084-2.
8
Genotype Imputation to Improve the Cost-Efficiency of Genomic Selection in Rabbits.通过基因型填充提高家兔基因组选择的成本效益
Animals (Basel). 2021 Mar 13;11(3):803. doi: 10.3390/ani11030803.
9
Random forest estimation of genomic breeding values for disease susceptibility over different disease incidences and genomic architectures in simulated cow calibration groups.在模拟奶牛校准群体中,针对不同疾病发病率和基因组结构的疾病易感性,采用随机森林法估计基因组育种值。
J Dairy Sci. 2016 Sep;99(9):7261-7273. doi: 10.3168/jds.2016-10887. Epub 2016 Jun 22.
10
Comparisons of single-stage and two-stage approaches to genomic selection.单阶段和两阶段基因组选择方法的比较。
Theor Appl Genet. 2013 Jan;126(1):69-82. doi: 10.1007/s00122-012-1960-1. Epub 2012 Aug 19.

引用本文的文献

1
Disentangling soybean GxE effects in an integrated genomic prediction and machine learning-GWAS workflow.在整合基因组预测和机器学习-全基因组关联研究工作流程中解析大豆基因型与环境互作效应
Plant Methods. 2025 Aug 25;21(1):119. doi: 10.1186/s13007-025-01434-0.
2
Comparison of Tree-Based Machine Learning Algorithms for Classification of Livestock Breeds Based On Post-Thaw Spermatological Parameters.基于解冻后精子学参数的家畜品种分类的树基机器学习算法比较
Vet Med Sci. 2025 Sep;11(5):e70539. doi: 10.1002/vms3.70539.
3
Identification of Key Genes Associated with Overall Survival in Glioblastoma Multiforme Using TCGA RNA-Seq Expression Data.

本文引用的文献

1
A comparison of five methods to predict genomic breeding values of dairy bulls from genome-wide SNP markers.比较五种方法从全基因组 SNP 标记预测奶牛公牛的基因组育种值。
Genet Sel Evol. 2009 Dec 31;41(1):56. doi: 10.1186/1297-9686-41-56.
2
A comprehensive comparison of random forests and support vector machines for microarray-based cancer classification.基于微阵列的癌症分类中随机森林与支持向量机的全面比较
BMC Bioinformatics. 2008 Jul 22;9:319. doi: 10.1186/1471-2105-9-319.
3
Prediction of total genetic value using genome-wide dense marker maps.
利用TCGA RNA测序表达数据鉴定多形性胶质母细胞瘤中与总生存期相关的关键基因
Genes (Basel). 2025 Jun 27;16(7):755. doi: 10.3390/genes16070755.
4
A narrative review of the use of PROMs and machine learning to impact value-based clinical decision-making.关于使用患者报告结局测量信息(PROMs)和机器学习来影响基于价值的临床决策的叙述性综述。
BMC Med Inform Decis Mak. 2025 Jul 4;25(1):250. doi: 10.1186/s12911-025-03083-8.
5
GPS: Harnessing data fusion strategies to improve the accuracy of machine learning-based genomic and phenotypic selection.GPS:利用数据融合策略提高基于机器学习的基因组和表型选择的准确性。
Plant Commun. 2025 Aug 11;6(8):101416. doi: 10.1016/j.xplc.2025.101416. Epub 2025 Jun 11.
6
Environment ensemble models for genomic prediction in common bean (Phaseolus vulgaris L.).普通菜豆(Phaseolus vulgaris L.)基因组预测的环境集成模型。
Plant Genome. 2025 Jun;18(2):e70057. doi: 10.1002/tpg2.70057.
7
Genomic selection: Essence, applications, and prospects.基因组选择:本质、应用与前景。
Plant Genome. 2025 Jun;18(2):e70053. doi: 10.1002/tpg2.70053.
8
Transmission pathways of Campylobacter jejuni between humans and livestock in rural Ethiopia are highly complex and interdependent.在埃塞俄比亚农村地区,空肠弯曲菌在人类和牲畜之间的传播途径极为复杂且相互依存。
Gut Pathog. 2025 May 3;17(1):26. doi: 10.1186/s13099-025-00691-7.
9
Source attribution of human infection: a multi-country model in the European Union.人类感染的来源归因:欧盟的多国模型
Front Microbiol. 2025 Feb 5;16:1519189. doi: 10.3389/fmicb.2025.1519189. eCollection 2025.
10
Performance Comparison of Genomic Best Linear Unbiased Prediction and Four Machine Learning Models for Estimating Genomic Breeding Values in Working Dogs.基因组最佳线性无偏预测与四种机器学习模型在工作犬基因组育种值估计中的性能比较
Animals (Basel). 2025 Feb 2;15(3):408. doi: 10.3390/ani15030408.
利用全基因组密集标记图谱预测总遗传值。
Genetics. 2001 Apr;157(4):1819-29. doi: 10.1093/genetics/157.4.1819.