Research School of Finance, Actuarial Studies and Statistics, Australian National University, Canberra, ACT 2601, Australia.
Biometrics. 2024 Jan 29;80(1). doi: 10.1093/biomtc/ujad001.
When building regression models for multivariate abundance data in ecology, it is important to allow for the fact that the species are correlated with each other. Moreover, there is often evidence species exhibit some degree of homogeneity in their responses to each environmental predictor, and that most species are informed by only a subset of predictors. We propose a generalized estimating equation (GEE) approach for simultaneous homogeneity pursuit (ie, grouping species with similar coefficient values while allowing differing groups for different covariates) and variable selection in regression models for multivariate abundance data. Using GEEs allows us to straightforwardly account for between-response correlations through a (reduced-rank) working correlation matrix. We augment the GEE with both adaptive fused lasso- and adaptive lasso-type penalties, which aim to cluster the species-specific coefficients within each covariate and encourage differing levels of sparsity across the covariates, respectively. Numerical studies demonstrate the strong finite sample performance of the proposed method relative to several existing approaches for modeling multivariate abundance data. Applying the proposed method to presence-absence records collected along the Great Barrier Reef in Australia reveals both a substantial degree of homogeneity and sparsity in species-environmental relationships. We show this leads to a more parsimonious model for understanding the environmental drivers of seabed biodiversity, and results in stronger out-of-sample predictive performance relative to methods that do not accommodate such features.
在为生态学中的多元丰度数据构建回归模型时,重要的是要考虑到物种之间存在相关性的事实。此外,通常有证据表明,物种对每个环境预测因子的反应存在一定程度的同质性,而且大多数物种只受预测因子的子集所影响。我们提出了一种广义估计方程(GEE)方法,用于多元丰度数据回归模型中的同时同质性追求(即,将具有相似系数值的物种分组,同时允许不同的组对应不同的协变量)和变量选择。使用 GEE 可以通过(降秩)工作相关矩阵直接考虑响应之间的相关性。我们通过自适应融合lasso 和自适应lasso 型惩罚项来增强 GEE,这些惩罚项旨在对每个协变量内的物种特异性系数进行聚类,并分别鼓励协变量之间的不同稀疏程度。数值研究表明,与用于建模多元丰度数据的几种现有方法相比,所提出的方法具有很强的有限样本性能。将所提出的方法应用于澳大利亚大堡礁沿线采集的存在-缺失记录,揭示了物种-环境关系中存在很大程度的同质性和稀疏性。我们表明,这导致了一个更简约的模型,可以更好地理解海底生物多样性的环境驱动因素,并且相对于不考虑这些特征的方法,具有更强的样本外预测性能。