Tu Shengxin, Li Chun, Shepherd Bryan E
Department of Biostatistics, Vanderbilt University, Nashville, Tennessee, USA.
Department of Population and Public Health Sciences, University of Southern California, California, Los Angeles, USA.
Stat Med. 2025 Feb 10;44(3-4):e10326. doi: 10.1002/sim.10326.
Clustered data are common in practice. Clustering arises when subjects are measured repeatedly, or subjects are nested in groups (e.g., households, schools). It is often of interest to evaluate the correlation between two variables with clustered data. There are three commonly used Pearson correlation coefficients (total, between-, and within-cluster), which together provide an enriched perspective of the correlation. However, these Pearson correlation coefficients are sensitive to extreme values and skewed distributions. They also vary with data transformation, which is arbitrary and often difficult to choose, and they are not applicable to ordered categorical data. Current nonparametric correlation measures for clustered data are only for the total correlation. Here we define population parameters for the between- and within-cluster Spearman rank correlations. The definitions are natural extensions of the Pearson between- and within-cluster correlations to the rank scale. We show that the total Spearman rank correlation approximates a linear combination of the between- and within-cluster Spearman rank correlations, where the weights are functions of rank intraclass correlations of the two random variables. We also discuss the equivalence between the within-cluster Spearman rank correlation and the covariate-adjusted partial Spearman rank correlation. Furthermore, we describe estimation and inference for the three Spearman rank correlations, conduct simulations to evaluate the performance of our estimators, and illustrate their use with data from a longitudinal biomarker study and a clustered randomized trial.
聚类数据在实际中很常见。当对受试者进行重复测量,或者受试者嵌套在组中(例如家庭、学校)时,就会出现聚类情况。对于聚类数据,评估两个变量之间的相关性通常是很有意义的。有三种常用的皮尔逊相关系数(总体、组间和组内),它们共同提供了对相关性的丰富视角。然而,这些皮尔逊相关系数对极端值和偏态分布很敏感。它们也会随着数据变换而变化,而数据变换是任意的且通常难以选择,并且它们不适用于有序分类数据。当前用于聚类数据的非参数相关度量仅针对总体相关性。在这里,我们定义了组间和组内斯皮尔曼等级相关性的总体参数。这些定义是将皮尔逊组间和组内相关性自然扩展到等级尺度。我们表明,总体斯皮尔曼等级相关性近似于组间和组内斯皮尔曼等级相关性的线性组合,其中权重是两个随机变量的等级组内相关性的函数。我们还讨论了组内斯皮尔曼等级相关性与协变量调整后的偏斯皮尔曼等级相关性之间的等价性。此外,我们描述了这三种斯皮尔曼等级相关性的估计和推断,进行了模拟以评估我们估计量的性能,并通过一项纵向生物标志物研究和一项聚类随机试验的数据说明了它们的用途。