重新审视主成分分析：具有平滑收敛性的快速多性状遗传评估

Principal component analysis revisited: fast multitrait genetic evaluations with smooth convergence.

作者信息

Ahlinder Jon, Hall David, Suontama Mari, Sillanpää Mikko J

机构信息

Department of Tree Breeding, Skogforsk, Box 3, Tomterna 1, Sävar SE-91821, Sweden.

Department of Ecology and Environmental Science, Umeå University, Umeå SE-90736, Sweden.

出版信息

G3 (Bethesda). 2024 Oct 21;14(12). doi: 10.1093/g3journal/jkae228.

DOI:10.1093/g3journal/jkae228

PMID:39429114

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC11631533/

Abstract

A cornerstone in breeding and population genetics is the genetic evaluation procedure, needed to make important decisions on population management. Multivariate mixed model analysis, in which many traits are considered jointly, utilizes genetic and environmental correlations between traits to improve the accuracy. However, the number of parameters in the multitrait model grows exponentially with the number of traits which reduces its scalability. Here, we suggest using principal component analysis to reduce the dimensions of the response variables, and then using the computed principal components as separate responses in the genetic evaluation analysis. As principal components are orthogonal to each other so that phenotypic covariance is abscent between principal components, a full multivariate analysis can be approximated by separate univariate analyses instead which should speed up computations considerably. We compared the approach to both traditional multivariate analysis and factor analytic approach in terms of computational requirement and rank lists according to predicted genetic merit on two forest tree datasets with 22 and 27 measured traits, respectively. Obtained rank lists of the top 50 individuals were in good agreement. Interestingly, the required computational time of the approach only took a few seconds without convergence issues, unlike the traditional approach which required considerably more time to run (7 and 10 h, respectively). The factor analytic approach took approximately 5-10 min. Our approach can easily handle missing data and can be used with all available linear mixed effect model softwares as it does not require any specific implementation. The approach can help to mitigate difficulties with multitrait genetic analysis in both breeding and wild populations.

摘要

育种和群体遗传学的一个基石是遗传评估程序，这是做出群体管理重要决策所必需的。多变量混合模型分析联合考虑多个性状，利用性状之间的遗传和环境相关性来提高准确性。然而，多性状模型中的参数数量会随着性状数量呈指数增长，这降低了其可扩展性。在此，我们建议使用主成分分析来降低响应变量的维度，然后将计算出的主成分作为遗传评估分析中的单独响应。由于主成分彼此正交，因此主成分之间不存在表型协方差，这样就可以通过单独的单变量分析来近似完整的多变量分析，这应该会大大加快计算速度。我们根据预测的遗传价值，在分别具有22个和27个测量性状的两个林木数据集上，从计算要求和排名列表方面将该方法与传统多变量分析和因子分析方法进行了比较。获得的前50个个体的排名列表吻合良好。有趣的是，该方法所需的计算时间仅为几秒，不存在收敛问题，而传统方法运行所需时间长得多（分别为7小时和10小时）。因子分析方法大约需要5 - 10分钟。我们的方法可以轻松处理缺失数据，并且可以与所有可用的线性混合效应模型软件一起使用，因为它不需要任何特定的实现方式。该方法有助于缓解育种和野生群体中多性状遗传分析的困难。