• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

通过交叉验证控制基因组选择中遗传力的过拟合。

Controlling the Overfitting of Heritability in Genomic Selection through Cross Validation.

机构信息

Department of Botany and Plant Sciences, University of California, Riverside, USA.

出版信息

Sci Rep. 2017 Oct 20;7(1):13678. doi: 10.1038/s41598-017-14070-z.

DOI:10.1038/s41598-017-14070-z
PMID:29057969
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC5651917/
Abstract

In genomic selection (GS), all the markers across the entire genome are used to conduct marker-assisted selection such that each quantitative trait locus of complex trait is in linkage disequilibrium with at least one marker. Although GS improves estimated breeding values and genetic gain, in most GS models genetic variance is estimated from training samples with many trait-irrelevant markers, which leads to severe overfitting in the calculation of trait heritability. In this study, we demonstrated overfitting heritability due to the inclusion of trait-irrelevant markers using a series of simulations, and such overfitting can be effectively controlled by cross validation experiment. In the proposed method, the genetic variance is simply the variance of the genetic values predicted through cross validation, the residual variance is the variance of the differences between the observed phenotypic values and the predicted genetic values, and these two resultant variance components are used for calculating the unbiased heritability. We also demonstrated that the heritability calculated through cross validation is equivalent to trait predictability, which objectively reflects the applicability of the GS models. The proposed method can be implemented with the Mixed Procedure in SAS or with our R package "GSMX" which is publically available at https://cran.r-project.org/web/packages/GSMX/index.html .

摘要

在基因组选择(GS)中,使用整个基因组中的所有标记来进行标记辅助选择,使得复杂性状的每个数量性状位点都与至少一个标记处于连锁不平衡状态。尽管 GS 提高了估计的育种值和遗传增益,但在大多数 GS 模型中,遗传方差是从包含许多与性状无关的标记的训练样本中估计的,这导致在计算性状遗传力时严重过度拟合。在这项研究中,我们通过一系列模拟演示了由于包含与性状无关的标记而导致的遗传力过度拟合,并且可以通过交叉验证实验有效地控制这种过度拟合。在所提出的方法中,遗传方差只是通过交叉验证预测的遗传值的方差,剩余方差是观察到的表型值和预测的遗传值之间的差异的方差,这两个结果方差分量用于计算无偏遗传力。我们还证明了通过交叉验证计算的遗传力等同于性状可预测性,这客观地反映了 GS 模型的适用性。该方法可以通过 SAS 中的混合过程或我们的 R 包“GSMX”实现,该 R 包可在 https://cran.r-project.org/web/packages/GSMX/index.html 上公开获得。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b45d/5651917/0daade9e6125/41598_2017_14070_Fig2_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b45d/5651917/65301a9f2210/41598_2017_14070_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b45d/5651917/0daade9e6125/41598_2017_14070_Fig2_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b45d/5651917/65301a9f2210/41598_2017_14070_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b45d/5651917/0daade9e6125/41598_2017_14070_Fig2_HTML.jpg

相似文献

1
Controlling the Overfitting of Heritability in Genomic Selection through Cross Validation.通过交叉验证控制基因组选择中遗传力的过拟合。
Sci Rep. 2017 Oct 20;7(1):13678. doi: 10.1038/s41598-017-14070-z.
2
GA-GBLUP: leveraging the genetic algorithm to improve the predictability of genomic selection.GA-GBLUP:利用遗传算法提高基因组选择的预测能力。
Brief Bioinform. 2024 Jul 25;25(5). doi: 10.1093/bib/bbae385.
3
Accuracy of Genomic Selection in a Rice Synthetic Population Developed for Recurrent Selection Breeding.用于轮回选择育种的水稻合成群体中基因组选择的准确性
PLoS One. 2015 Aug 27;10(8):e0136594. doi: 10.1371/journal.pone.0136594. eCollection 2015.
4
Determination of the optimal number of markers and individuals in a training population necessary for maximum prediction accuracy in F populations by using genomic selection models.利用基因组选择模型确定F群体中为实现最大预测准确性所需的训练群体中标记和个体的最佳数量。
Genet Mol Res. 2016 Nov 21;15(4):gmr-15-04-gmr.15048874. doi: 10.4238/gmr15048874.
5
BWGS: A R package for genomic selection and its application to a wheat breeding programme.BWGS:一个基因组选择的 R 包及其在小麦育种计划中的应用。
PLoS One. 2020 Apr 2;15(4):e0222733. doi: 10.1371/journal.pone.0222733. eCollection 2020.
6
Will genomic selection be a practical method for plant breeding?基因组选择是否将成为一种实用的植物育种方法?
Ann Bot. 2012 Nov;110(6):1303-16. doi: 10.1093/aob/mcs109. Epub 2012 May 29.
7
Genomic selection for growth and wood quality in Eucalyptus: capturing the missing heritability and accelerating breeding for complex traits in forest trees.基因组选择在桉树生长和木材质量中的应用:捕捉遗传缺失和加速林木复杂性状的选育。
New Phytol. 2012 Apr;194(1):116-128. doi: 10.1111/j.1469-8137.2011.04038.x. Epub 2012 Feb 6.
8
Effects of marker density and population structure on the genomic prediction accuracy for growth trait in Pacific white shrimp Litopenaeus vannamei.标记密度和群体结构对凡纳滨对虾生长性状基因组预测准确性的影响
BMC Genet. 2017 May 17;18(1):45. doi: 10.1186/s12863-017-0507-5.
9
Genomic approaches to selection in outcrossing perennials: focus on essential oil crops.异花授粉多年生植物选择的基因组方法:聚焦于精油作物。
Theor Appl Genet. 2015 Dec;128(12):2351-65. doi: 10.1007/s00122-015-2591-0. Epub 2015 Aug 4.
10
Accounting for trait architecture in genomic predictions of US Holstein cattle using a weighted realized relationship matrix.利用加权实现关系矩阵对美国荷斯坦奶牛的基因组预测进行性状结构分析。
Genet Sel Evol. 2015 Apr 2;47(1):24. doi: 10.1186/s12711-015-0100-1.

引用本文的文献

1
Genomic selection of maize test-cross hybrids leveraged by marker sampling.利用标记抽样对玉米测交杂种进行基因组选择。
Plant Genome. 2025 Jun;18(2):e70030. doi: 10.1002/tpg2.70030.
2
A comprehensive multivariate approach for GxE interaction analysis in early maturing rice varieties.一种用于早熟水稻品种基因与环境互作分析的综合多变量方法。
Front Plant Sci. 2024 Oct 1;15:1462981. doi: 10.3389/fpls.2024.1462981. eCollection 2024.
3
Medication use and risk of amyotrophic lateral sclerosis: using machine learning for an exposome-wide screen of a large clinical database.

本文引用的文献

1
Genetic mapping and genomic selection using recombination breakpoint data.利用重排断点数据进行遗传图谱构建和基因组选择。
Genetics. 2013 Nov;195(3):1103-15. doi: 10.1534/genetics.113.155309. Epub 2013 Aug 26.
2
Potential benefits of genomic selection on genetic gain of small ruminant breeding programs.基因组选择对小反刍动物育种计划遗传增益的潜在益处。
J Anim Sci. 2013 Aug;91(8):3644-57. doi: 10.2527/jas.2012-6205. Epub 2013 Jun 4.
3
Multiple-trait genomic selection methods increase genetic value prediction accuracy.多性状基因组选择方法提高遗传值预测准确性。
药物使用与肌萎缩侧索硬化症风险:利用机器学习对大型临床数据库进行暴露组全筛检
Amyotroph Lateral Scler Frontotemporal Degener. 2024 May;25(3-4):367-375. doi: 10.1080/21678421.2024.2320878. Epub 2024 Mar 1.
4
Unveiling the future of COVID-19 patient care: groundbreaking prediction models for severe outcomes or mortality in hospitalized cases.揭示新冠患者护理的未来:针对住院病例严重后果或死亡的突破性预测模型。
Front Med (Lausanne). 2024 Jan 5;10:1289968. doi: 10.3389/fmed.2023.1289968. eCollection 2023.
5
New Diagnostic Modality Combining Mass Spectrometry and Machine Learning for the Discrimination of Malignant Intraductal Papillary Mucinous Neoplasms.新的诊断方法结合质谱和机器学习,用于鉴别恶性导管内乳头状黏液性肿瘤。
Ann Surg Oncol. 2023 May;30(5):3150-3157. doi: 10.1245/s10434-022-13012-y. Epub 2023 Jan 8.
6
Integration of DNA Methylation and Transcriptome Data Improves Complex Trait Prediction in .DNA甲基化与转录组数据的整合改善了……中的复杂性状预测
Plants (Basel). 2022 Aug 24;11(17):2190. doi: 10.3390/plants11172190.
7
Machine learning model from a Spanish cohort for prediction of SARS-COV-2 mortality risk and critical patients.基于西班牙队列的机器学习模型预测 SARS-CoV-2 死亡率和危重症患者。
Sci Rep. 2022 Apr 6;12(1):5723. doi: 10.1038/s41598-022-09613-y.
8
Accurate diagnosis of atopic dermatitis by combining transcriptome and microbiota data with supervised machine learning.通过转录组和微生物组数据与有监督机器学习相结合,实现特应性皮炎的准确诊断。
Sci Rep. 2022 Jan 7;12(1):290. doi: 10.1038/s41598-021-04373-7.
9
Genome‑wide association study and genomic prediction for growth traits in yellow-plumage chicken using genotyping-by-sequencing.利用测序基因型技术进行黄羽肉鸡生长性状的全基因组关联研究和基因组预测。
Genet Sel Evol. 2021 Oct 27;53(1):82. doi: 10.1186/s12711-021-00672-9.
10
NOREVA: enhanced normalization and evaluation of time-course and multi-class metabolomic data.NOREVA:时间进程和多类代谢组学数据的增强标准化和评估。
Nucleic Acids Res. 2020 Jul 2;48(W1):W436-W448. doi: 10.1093/nar/gkaa258.
Genetics. 2012 Dec;192(4):1513-22. doi: 10.1534/genetics.112.144246. Epub 2012 Oct 19.
4
Gains in QTL detection using an ultra-high density SNP map based on population sequencing relative to traditional RFLP/SSR markers.利用基于群体测序的超高密度 SNP 图谱相对于传统 RFLP/SSR 标记进行 QTL 检测的增益。
PLoS One. 2011 Mar 3;6(3):e17595. doi: 10.1371/journal.pone.0017595.
5
LASSO with cross-validation for genomic selection.用于基因组选择的带交叉验证的套索算法。
Genet Res (Camb). 2009 Dec;91(6):427-36. doi: 10.1017/S0016672309990334.
6
An expectation-maximization algorithm for the Lasso estimation of quantitative trait locus effects.一种期望最大化算法,用于对数量性状基因座效应进行 Lasso 估计。
Heredity (Edinb). 2010 Nov;105(5):483-94. doi: 10.1038/hdy.2009.180. Epub 2010 Jan 6.
7
Bayesian LASSO for quantitative trait loci mapping.用于数量性状基因座定位的贝叶斯套索法
Genetics. 2008 Jun;179(2):1045-55. doi: 10.1534/genetics.107.085589. Epub 2008 May 27.
8
Genomic selection.基因组选择
J Anim Breed Genet. 2007 Dec;124(6):323-30. doi: 10.1111/j.1439-0388.2007.00702.x.
9
A unified mixed-model method for association mapping that accounts for multiple levels of relatedness.一种用于关联映射的统一混合模型方法,该方法考虑了多个相关水平。
Nat Genet. 2006 Feb;38(2):203-8. doi: 10.1038/ng1702. Epub 2005 Dec 25.
10
Bayesian shrinkage estimation of quantitative trait loci parameters.数量性状基因座参数的贝叶斯收缩估计
Genetics. 2005 May;170(1):465-80. doi: 10.1534/genetics.104.039354. Epub 2005 Mar 21.