用于二项因变量的系统发育逻辑回归。

Phylogenetic logistic regression for binary dependent variables.

机构信息

Department of Zoology, University of Wisconsin-Madison, Madison, WI 53706, USA.

出版信息

Syst Biol. 2010 Jan;59(1):9-26. doi: 10.1093/sysbio/syp074. Epub 2009 Nov 4.

Abstract

We develop statistical methods for phylogenetic logistic regression in which the dependent variable is binary (0 or 1) and values are nonindependent among species, with phylogenetically related species tending to have the same value of the dependent variable. The methods are based on an evolutionary model of binary traits in which trait values switch between 0 and 1 as species evolve up a phylogenetic tree. The more frequently the trait values switch (i.e., the higher the rate of evolution), the more rapidly correlations between trait values for phylogenetically related species break down. Therefore, the statistical methods also give a way to estimate the phylogenetic signal of binary traits. More generally, the methods can be applied with continuous- and/or discrete-valued independent variables. Using simulations, we assess the statistical properties of the methods, including bias in the estimates of the logistic regression coefficients and the parameter that estimates the strength of phylogenetic signal in the dependent variable. These analyses show that, as with the case for continuous-valued dependent variables, phylogenetic logistic regression should be used rather than standard logistic regression when there is the possibility of phylogenetic correlations among species. Standard logistic regression does not properly account for the loss of information caused by resemblance of relatives and as a result is likely to give inflated type I error rates, incorrectly identifying regression parameters as statistically significantly different from zero when they are not.

摘要

我们开发了一种用于系统发育逻辑回归的统计方法,其中因变量是二分类(0 或 1),并且在物种之间是非独立的,具有亲缘关系的物种往往具有相同的因变量值。这些方法基于二分类性状的进化模型,其中性状值在物种沿着系统发育树进化时在 0 和 1 之间切换。性状值切换的频率越高(即进化率越高),亲缘关系物种之间的性状值相关性就越快瓦解。因此,统计方法还可以估计二分类性状的系统发育信号。更一般地说,这些方法可以应用于连续值和/或离散值的自变量。通过模拟,我们评估了这些方法的统计性质,包括逻辑回归系数估计值的偏差以及估计因变量中系统发育信号强度的参数。这些分析表明,与连续值因变量的情况一样,当物种之间存在系统发育相关性的可能性时,应该使用系统发育逻辑回归而不是标准逻辑回归。标准逻辑回归没有正确考虑到亲缘关系相似性导致的信息损失,因此很可能导致过高的Ⅰ型错误率,错误地将回归参数识别为与零显著不同,而实际上并非如此。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索