Suppr超能文献

性状遗传结构和种群结构决定了自然拟南芥种群中基因组预测的模型选择。

Trait genetic architecture and population structure determine model selection for genomic prediction in natural Arabidopsis thaliana populations.

作者信息

Gibbs Patrick M, Paril Jefferson F, Fournier-Level Alexandre

机构信息

School of BioSciences, The University of Melbourne, Royal Parade, Parkville, VIC 3010, Australia.

Agriculture Victoria Research, Department of Energy, Environment and Climate Action, La Trobe University, AgriBio, 5 Ring Road, Bundoora, VIC 3083, Australia.

出版信息

Genetics. 2025 Mar 17;229(3). doi: 10.1093/genetics/iyaf003.

Abstract

Genomic prediction applies to any agro- or ecologically relevant traits, with distinct ontologies and genetic architectures. Selecting the most appropriate model for the distribution of genetic effects and their associated allele frequencies in the training population is crucial. Linear regression models are often preferred for genomic prediction. However, linear models may not suit all genetic architectures and training populations. Machine learning approaches have been proposed to improve genomic prediction owing to their capacity to capture complex biology including epistasis. However, the applicability of different genomic prediction models, including non-linear, non-parametric approaches, has not been rigorously assessed across a wide variety of plant traits in natural outbreeding populations. This study evaluates genomic prediction sensitivity to trait ontology and the impact of population structure on model selection and prediction accuracy. Examining 36 quantitative traits in 1,000+ natural genotypes of the model plant Arabidopsis thaliana, we assessed the performance of penalized regression, random forest, and multilayer perceptron at producing genomic predictions. Regression models were generally the most accurate, except for biochemical traits where random forest performed best. We link this result to the genetic architecture of each trait-notably that biochemical traits have simpler genetic architecture than macroscopic traits. Moreover, complex macroscopic traits, particularly those related to flowering time and yield, were strongly correlated to population structure, while molecular traits were better predicted by fewer, independent markers. This study highlights the relevance of machine learning approaches for simple molecular traits and underscores the need to consider ancestral population history when designing training samples.

摘要

基因组预测适用于任何与农业或生态相关的性状,这些性状具有不同的本体论和遗传结构。为训练群体中遗传效应的分布及其相关等位基因频率选择最合适的模型至关重要。线性回归模型在基因组预测中通常更受青睐。然而,线性模型可能并不适用于所有的遗传结构和训练群体。由于机器学习方法能够捕捉包括上位性在内的复杂生物学现象,因此已被提出用于改进基因组预测。然而,包括非线性、非参数方法在内的不同基因组预测模型的适用性,尚未在自然杂交群体中的多种植物性状上得到严格评估。本研究评估了基因组预测对性状本体论的敏感性,以及群体结构对模型选择和预测准确性的影响。通过研究模式植物拟南芥1000多个自然基因型中的36个数量性状,我们评估了惩罚回归、随机森林和多层感知器在进行基因组预测时的性能。除了生化性状随机森林表现最佳外,回归模型通常是最准确的。我们将这一结果与每个性状的遗传结构联系起来——值得注意的是,生化性状的遗传结构比宏观性状更简单。此外,复杂的宏观性状,特别是那些与开花时间和产量相关的性状,与群体结构密切相关,而分子性状通过较少的独立标记就能得到更好的预测。本研究强调了机器学习方法对简单分子性状的相关性,并强调在设计训练样本时需要考虑祖先群体历史。

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验