Lyles R H, Fan D, Chuachoowong R
Department of Biostatistics, The Rollins School of Public Health, Emory University, 1518 Clifton Rd. N.E., Atlanta, GA 30322, USA.
Stat Med. 2001 Oct 15;20(19):2921-33. doi: 10.1002/sim.901.
When assessing a correlation between two exposure or biological marker variables, one sometimes encounters the problem of indeterminate values for one of the variables due to an assay detection limit. In this event, investigators often report correlation coefficients computed after removing the pairs involving non-detectable values, or after substituting some small constant for those values. These ad hoc practices can lead to bias in both point and confidence interval estimates of the true correlation coefficient. To address this issue, we consider two parametric techniques for estimating the correlation in the presence of left censoring for one of the variables. The first is a maximum likelihood approach, and the second is an adaptation of multiple imputation motivated primarily by potential benefits in confidence interval coverage. Both of the estimators studied reduce to the standard Pearson's correlation coefficient in the event of no censoring, and hence are valid in cases where this measure would be appropriate for the complete data. We assess these approaches empirically and contrast them with ad hoc methods for estimating the correlation between cervicovaginal human immunodeficiency virus (HIV) viral load measurements and CD4+ lymphocyte counts from HIV positive women enrolled in a clinical trial conducted in Bangkok, Thailand.
在评估两个暴露或生物标志物变量之间的相关性时,有时会遇到由于检测限导致其中一个变量存在不确定值的问题。在这种情况下,研究人员通常会报告在去除包含不可检测值的配对后计算得到的相关系数,或者用某个小常数替代这些值后计算得到的相关系数。这些临时做法可能会导致真实相关系数的点估计和置信区间估计出现偏差。为了解决这个问题,我们考虑两种参数化技术,用于在一个变量存在左删失的情况下估计相关性。第一种是最大似然法,第二种是主要受置信区间覆盖范围潜在益处驱动的多重填补法的一种改编。在无删失的情况下,所研究的两种估计量都简化为标准的皮尔逊相关系数,因此在该度量适用于完整数据的情况下是有效的。我们通过实证评估这些方法,并将它们与用于估计泰国曼谷一项临床试验中HIV阳性女性的宫颈阴道人类免疫缺陷病毒(HIV)病毒载量测量值与CD4 +淋巴细胞计数之间相关性的临时方法进行对比。