Sztepanacz Jacqueline L, Houle David
Department of Ecology and Evolutionary Biology, University of Toronto, Toronto, ON, Canada.
Department of Biology, Florida State University, Tallahassee, FL, United States.
Evol Lett. 2024 Jan 23;8(3):361-373. doi: 10.1093/evlett/qrad064. eCollection 2024 Jun.
The breeder's equation, , allows us to understand how genetics (the genetic covariance matrix, ) and the vector of linear selection gradients interact to generate evolutionary trajectories. Estimation of using multiple regression of trait values on relative fitness revolutionized the way we study selection in laboratory and wild populations. However, multicollinearity, or correlation of predictors, can lead to very high variances of and covariances between elements of , posing a challenge for the interpretation of the parameter estimates. This is particularly relevant in the era of big data, where the number of predictors may approach or exceed the number of observations. A common approach to multicollinear predictors is to discard some of them, thereby losing any information that might be gained from those traits. Using simulations, we show how, on the one hand, multicollinearity can result in inaccurate estimates of selection, and, on the other, how the removal of correlated phenotypes from the analyses can provide a misguided view of the targets of selection. We show that regularized regression, which places data-validated constraints on the magnitudes of individual elements of , can produce more accurate estimates of the total strength and direction of multivariate selection in the presence of multicollinearity and limited data, and often has little cost when multicollinearity is low. We also compare standard and regularized regression estimates of selection in a reanalysis of three published case studies, showing that regularized regression can improve fitness predictions in independent data. Our results suggest that regularized regression is a valuable tool that can be used as an important complement to traditional least-squares estimates of selection. In some cases, its use can lead to improved predictions of individual fitness, and improved estimates of the total strength and direction of multivariate selection.
育种家方程 使我们能够理解遗传学(遗传协方差矩阵 )与线性选择梯度向量 如何相互作用以产生进化轨迹。通过将性状值对相对适合度进行多元回归来估计 ,彻底改变了我们在实验室和野生种群中研究选择的方式。然而,多重共线性,即预测变量之间的相关性,可能导致 的方差以及 各元素之间的协方差非常高,这给参数估计的解释带来了挑战。在大数据时代,这一问题尤为突出,因为预测变量的数量可能接近或超过观测值的数量。处理多重共线性预测变量的常见方法是舍弃其中一些变量,从而丢失可能从这些性状中获得的任何信息。通过模拟,我们展示了一方面多重共线性如何导致选择估计不准确,另一方面从分析中去除相关表型如何可能提供对选择目标的误导性观点。我们表明,正则化回归,即在 的各个元素大小上施加数据验证约束,可以在存在多重共线性和数据有限的情况下,更准确地估计多元选择的总强度和方向,而且在多重共线性较低时通常成本很小。我们还在对三个已发表的案例研究的重新分析中比较了选择的标准回归估计和正则化回归估计,结果表明正则化回归可以改善独立数据中的适合度预测。我们的结果表明,正则化回归是一种有价值的工具,可以用作传统最小二乘选择估计的重要补充。在某些情况下,使用它可以改进个体适合度的预测,并改进多元选择总强度和方向的估计。