Department of Integrative Biology, UW-Madison, Madison, WI 53706, USA.
Syst Biol. 2019 Mar 1;68(2):234-251. doi: 10.1093/sysbio/syy060.
Many researchers want to report an $R^{2}$ to measure the variance explained by a model. When the model includes correlation among data, such as phylogenetic models and mixed models, defining an $R^{2}$ faces two conceptual problems. (i) It is unclear how to measure the variance explained by predictor (independent) variables when the model contains covariances. (ii) Researchers may want the $R^{2}$ to include the variance explained by the covariances by asking questions such as "How much of the data is explained by phylogeny?" Here, I investigated three $R^{2}$s for phylogenetic and mixed models. $R^{2}{resid}$ is an extension of the ordinary least-squares $R^{2}$ that weights residuals by variances and covariances estimated by the model; it is closely related to $R^{2}{glmm}$ presented by Nakagawa and Schielzeth (2013. A general and simple method for obtaining R2 from generalized linear mixed-effects models. Methods Ecol. Evol. 4:133-142). $R^{2}{pred}$ is based on predicting each residual from the fitted model and computing the variance between observed and predicted values. $R^{2}{lik}$ is based on the likelihood of fitted models, and therefore, reflects the amount of information that the models contain. These three $R^{2}$s are formulated as partial $R^{2}$s, making it possible to compare the contributions of predictor variables and variance components (phylogenetic signal and random effects) to the fit of models. Because partial $R^{2}$s compare a full model with a reduced model without components of the full model, they are distinct from marginal $R^{2}$s that partition additive components of the variance. I assessed the properties of the $R^{2}$s for phylogenetic models using simulations for continuous and binary response data (phylogenetic generalized least squares and phylogenetic logistic regression). Because the $R^{2}$s are designed broadly for any model for correlated data, I also compared $R^{2}$s for linear mixed models and generalized linear mixed models. $R^{2}{resid}$, $R^{2}{pred}$, and $R^{2}{lik}$ all have similar performance in describing the variance explained by different components of models. However, $R^{2}{pred}$ gives the most direct answer to the question of how much variance in the data is explained by a model. $R^{2}{resid}$ is most appropriate for comparing models fit to different data sets, because it does not depend on sample sizes. And $R^{2}{lik}$ is most appropriate to assess the importance of different components within the same model applied to the same data, because it is most closely associated with statistical significance tests.
许多研究人员希望报告一个$R^{2}$值来衡量模型解释的方差。当模型包括数据之间的相关性时,例如系统发育模型和混合模型,定义$R^{2}$值就面临两个概念问题。(i)当模型包含协方差时,如何衡量预测(独立)变量解释的方差是不清楚的。(ii)研究人员可能希望通过询问“系统发育解释了数据的多少?”等问题来让$R^{2}$值包含协方差解释的方差。在这里,我研究了三种用于系统发育和混合模型的$R^{2}$值。$R^{2}{resid}$是普通最小二乘$R^{2}$的扩展,它通过模型估计的方差和协方差对残差进行加权;它与 Nakagawa 和 Schielzeth(2013. 一种从广义线性混合效应模型中获得 R2 的通用且简单的方法。方法生态。进化。4:133-142)提出的$R^{2}{glmm}$密切相关。$R^{2}{pred}$是基于从拟合模型中预测每个残差,并计算观测值和预测值之间的方差。$R^{2}{lik}$基于拟合模型的似然,因此反映了模型包含的信息量。这三个$R^{2}$值被定义为偏$R^{2}$值,这使得比较预测变量和方差分量(系统发育信号和随机效应)对模型拟合的贡献成为可能。由于偏$R^{2}$值将完整模型与不包含完整模型组件的简化模型进行比较,因此它们与将加性方差分量划分的边际$R^{2}$值不同。我使用连续和二进制响应数据(系统发育广义最小二乘法和系统发育逻辑回归)的模拟来评估$R^{2}$值在系统发育模型中的性质。由于$R^{2}$值广泛设计用于任何相关数据模型,因此我还比较了线性混合模型和广义线性混合模型的$R^{2}$值。$R^{2}{resid}$、$R^{2}{pred}$和$R^{2}{lik}$在描述模型不同组成部分解释的方差方面都具有相似的性能。然而,$R^{2}{pred}$最直接地回答了模型解释数据中多少方差的问题。$R^{2}{resid}$最适合于比较拟合不同数据集的模型,因为它不依赖于样本量。而$R^{2}{lik}$最适合于评估同一模型应用于同一数据时不同组件的重要性,因为它与统计显着性检验最密切相关。