Uh Hae-Won, Hartgers Franca C, Yazdanbakhsh Maria, Houwing-Duistermaat Jeanine J
Department of Medical Statistics and Bioinformatics, Leiden University Medical Center, Leiden, the Netherlands.
BMC Immunol. 2008 Oct 17;9:59. doi: 10.1186/1471-2172-9-59.
The statistical analysis of immunological data may be complicated because precise quantitative levels cannot always be determined. Values below a given detection limit may not be observed (nondetects), and data with nondetects are called left-censored. Since nondetects cannot be considered as missing at random, a statistician faced with data containing these nondetects must decide how to combine nondetects with detects. Till now, the common practice is to impute each nondetect with a single value such as a half of the detection limit, and to conduct ordinary regression analysis. The first aim of this paper is to give an overview of methods to analyze, and to provide new methods handling censored data other than an (ordinary) linear regression. The second aim is to compare these methods by simulation studies based on real data.
We compared six new and existing methods: deletion of nondetects, single substitution, extrapolation by regression on order statistics, multiple imputation using maximum likelihood estimation, tobit regression, and logistic regression. The deletion and extrapolation by regression on order statistics methods gave biased parameter estimates. The single substitution method underestimated variances, and logistic regression suffered loss of power. Based on simulation studies, we found that tobit regression performed well when the proportion of nondetects was less than 30%, and that taken together the multiple imputation method performed best.
Based on simulation studies, the newly developed multiple imputation method performed consistently well under different scenarios of various proportion of nondetects, sample sizes and even in the presence of heteroscedastic errors.
免疫数据的统计分析可能会很复杂,因为无法总是确定精确的定量水平。低于给定检测限的值可能无法观察到(未检出),包含未检出的数据称为左删失数据。由于未检出不能被视为随机缺失,面对包含这些未检出的数据的统计学家必须决定如何将未检出与检出数据相结合。到目前为止,常见的做法是用一个单一值(如检测限的一半)来插补每个未检出值,并进行普通回归分析。本文的首要目的是概述分析方法,并提供处理删失数据的新方法,而非(普通)线性回归方法。第二个目的是通过基于实际数据的模拟研究来比较这些方法。
我们比较了六种新的和现有的方法:删除未检出值、单值替代、基于顺序统计量的回归外推、使用最大似然估计的多重插补、托比特回归和逻辑回归。删除法和基于顺序统计量的回归外推法给出有偏差的参数估计。单值替代法低估了方差,逻辑回归则出现功效损失。基于模拟研究,我们发现当未检出值的比例小于30%时,托比特回归表现良好,总体而言多重插补法表现最佳。
基于模拟研究,新开发的多重插补法在不同比例的未检出值、样本量的各种情况下,甚至在存在异方差误差的情况下,都始终表现良好。