R$^{2}$s for Correlated Data: 进化模型、LMMs 和 GLMMs。

R$^{2}$s for Correlated Data: Phylogenetic Models, LMMs, and GLMMs.

机构信息

Department of Integrative Biology, UW-Madison, Madison, WI 53706, USA.

出版信息

Syst Biol. 2019 Mar 1;68(2):234-251. doi: 10.1093/sysbio/syy060.

Abstract

Many researchers want to report an $R^{2}$ to measure the variance explained by a model. When the model includes correlation among data, such as phylogenetic models and mixed models, defining an $R^{2}$ faces two conceptual problems. (i) It is unclear how to measure the variance explained by predictor (independent) variables when the model contains covariances. (ii) Researchers may want the $R^{2}$ to include the variance explained by the covariances by asking questions such as "How much of the data is explained by phylogeny?" Here, I investigated three $R^{2}$s for phylogenetic and mixed models. $R^{2}{resid}$ is an extension of the ordinary least-squares $R^{2}$ that weights residuals by variances and covariances estimated by the model; it is closely related to $R^{2}{glmm}$ presented by Nakagawa and Schielzeth (2013. A general and simple method for obtaining R2 from generalized linear mixed-effects models. Methods Ecol. Evol. 4:133-142). $R^{2}{pred}$ is based on predicting each residual from the fitted model and computing the variance between observed and predicted values. $R^{2}{lik}$ is based on the likelihood of fitted models, and therefore, reflects the amount of information that the models contain. These three $R^{2}$s are formulated as partial $R^{2}$s, making it possible to compare the contributions of predictor variables and variance components (phylogenetic signal and random effects) to the fit of models. Because partial $R^{2}$s compare a full model with a reduced model without components of the full model, they are distinct from marginal $R^{2}$s that partition additive components of the variance. I assessed the properties of the $R^{2}$s for phylogenetic models using simulations for continuous and binary response data (phylogenetic generalized least squares and phylogenetic logistic regression). Because the $R^{2}$s are designed broadly for any model for correlated data, I also compared $R^{2}$s for linear mixed models and generalized linear mixed models. $R^{2}{resid}$, $R^{2}{pred}$, and $R^{2}{lik}$ all have similar performance in describing the variance explained by different components of models. However, $R^{2}{pred}$ gives the most direct answer to the question of how much variance in the data is explained by a model. $R^{2}{resid}$ is most appropriate for comparing models fit to different data sets, because it does not depend on sample sizes. And $R^{2}{lik}$ is most appropriate to assess the importance of different components within the same model applied to the same data, because it is most closely associated with statistical significance tests.

摘要

许多研究人员希望报告一个$R^{2}$值来衡量模型解释的方差。当模型包括数据之间的相关性时，例如系统发育模型和混合模型，定义$R^{2}$值就面临两个概念问题。（i）当模型包含协方差时，如何衡量预测（独立）变量解释的方差是不清楚的。（ii）研究人员可能希望通过询问“系统发育解释了数据的多少？”等问题来让$R^{2}$值包含协方差解释的方差。在这里，我研究了三种用于系统发育和混合模型的$R^{2}$值。$R^{2}{resid}$是普通最小二乘$R^{2}$的扩展，它通过模型估计的方差和协方差对残差进行加权；它与 Nakagawa 和 Schielzeth（2013. 一种从广义线性混合效应模型中获得 R2 的通用且简单的方法。方法生态。进化。4:133-142）提出的$R^{2}{glmm}$密切相关。$R^{2}{pred}$是基于从拟合模型中预测每个残差，并计算观测值和预测值之间的方差。$R^{2}{lik}$基于拟合模型的似然，因此反映了模型包含的信息量。这三个$R^{2}$值被定义为偏$R^{2}$值，这使得比较预测变量和方差分量（系统发育信号和随机效应）对模型拟合的贡献成为可能。由于偏$R^{2}$值将完整模型与不包含完整模型组件的简化模型进行比较，因此它们与将加性方差分量划分的边际$R^{2}$值不同。我使用连续和二进制响应数据（系统发育广义最小二乘法和系统发育逻辑回归）的模拟来评估$R^{2}$值在系统发育模型中的性质。由于$R^{2}$值广泛设计用于任何相关数据模型，因此我还比较了线性混合模型和广义线性混合模型的$R^{2}$值。$R^{2}{resid}$、$R^{2}{pred}$和$R^{2}{lik}$在描述模型不同组成部分解释的方差方面都具有相似的性能。然而，$R^{2}{pred}$最直接地回答了模型解释数据中多少方差的问题。$R^{2}{resid}$最适合于比较拟合不同数据集的模型，因为它不依赖于样本量。而$R^{2}{lik}$最适合于评估同一模型应用于同一数据时不同组件的重要性，因为它与统计显着性检验最密切相关。