Anderson Sean A S, Kaushik Sachin, Matute Daniel R
School of Biological Sciences, Georgia Institute of Technology.
Department of Biology, University of North Carolina at Chapel Hill.
bioRxiv. 2024 Nov 30:2024.11.28.625927. doi: 10.1101/2024.11.28.625927.
A powerful but poorly understood analysis in ecology and evolutionary biology is the comparative study of lineage-pair traits. "Lineage-pair traits" are characters like 'diet niche overlap' and 'strength of reproductive isolation' that are defined for pairs of lineages instead of individual taxa. Comparative tests for causal relationships among such variables have led to groundbreaking insights in several classic studies, but the statistical validity of these analyses has been unclear due to the complex dependency structure of the data. Specifically, lineage-pair datasets contain non-independent observations, but studies to-date have relied on untested workarounds for data dependency rather than direct models of linear-pair covariance, and the statistical consequences of non-independence have not been thoroughly explored. Here we consider how evolutionary relatedness among taxa translates into non-independence among taxonomic pairs. We develop models by which phylogenetic signal in an underlying character generates covariance among pairs in a lineage-pair trait. We incorporate the resulting lineage-pair covariance matrix into a modified version of phylogenetic generalized least squares and a new beta regression model suitable for bounded response variables. Both models outperform previous approaches in simulation tests. We re-analyze two empirical datasets and find dramatic improvements in model fit and, in the case of avian hybridization data, an even stronger relationship between pair age and RI than revealed by standard linear regression. We present a new tool, the R package , to allow empiricists from a variety of biological fields to test relationships among pairwise-defined variables in a manner that is statistical robust and more straightforward to implement.
在生态学和进化生物学中,一种强大但尚未被充分理解的分析方法是对谱系对性状进行比较研究。“谱系对性状”是指像“饮食生态位重叠”和“生殖隔离强度”这样的特征,它们是为谱系对而非单个分类单元定义的。对这些变量之间因果关系的比较测试在一些经典研究中带来了开创性的见解,但由于数据复杂的依赖结构,这些分析的统计有效性一直不明确。具体而言,谱系对数据集包含非独立观测值,但迄今为止的研究依赖于未经检验的数据依赖解决方法,而非线性对协方差的直接模型,并且非独立性的统计后果尚未得到充分探索。在这里,我们考虑分类单元之间的进化相关性如何转化为分类对之间的非独立性。我们开发了一些模型,通过这些模型,潜在性状中的系统发育信号会在谱系对性状的对之间产生协方差。我们将由此产生的谱系对协方差矩阵纳入系统发育广义最小二乘法的修改版本以及一个适用于有界响应变量的新贝塔回归模型中。在模拟测试中,这两种模型都优于以前的方法。我们重新分析了两个实证数据集,发现模型拟合有了显著改进,并且就鸟类杂交数据而言,配对年龄与生殖隔离之间的关系比标准线性回归所揭示的更强。我们提出了一个新工具,即R包 ,以使来自各个生物学领域的实证研究人员能够以一种统计稳健且更易于实施的方式测试成对定义变量之间的关系。