Demissie Serkalem, LaValley Michael P, Horton Nicholas J, Glynn Robert J, Cupples L Adrienne
Department of Biostatistics, Boston University School of Public Health, 715 Albany Street T4E, Boston, MA 02118-2526, U.S.A.
Stat Med. 2003 Feb 28;22(4):545-57. doi: 10.1002/sim.1340.
We studied bias due to missing exposure data in the proportional hazards regression model when using complete-case analysis (CCA). Eleven missing data scenarios were considered: one with missing completely at random (MCAR), four missing at random (MAR), and six non-ignorable missingness scenarios, with a variety of hazard ratios, censoring fractions, missingness fractions and sample sizes. When missingness was MCAR or dependent only on the exposure, there was negligible bias (2-3 per cent) that was similar to the difference between the estimate in the full data set with no missing data and the true parameter. In contrast, substantial bias occurred when missingness was dependent on outcome or both outcome and exposure. For models with hazard ratio of 3.5, a sample size of 400, 20 per cent censoring and 40 per cent missing data, the relative bias for the hazard ratio ranged between 7 per cent and 64 per cent. We observed important differences in the direction and magnitude of biases under the various missing data mechanisms. For example, in scenarios where missingness was associated with longer or shorter follow-up, the biases were notably different, although both mechanisms are MAR. The hazard ratio was underestimated (with larger bias) when missingness was associated with longer follow-up and overestimated (with smaller bias) when associated with shorter follow-up. If it is known that missingness is associated with a less frequently observed outcome or with both the outcome and exposure, CCA may result in an invalid inference and other methods for handling missing data should be considered.
我们研究了在使用完全病例分析(CCA)时,比例风险回归模型中因暴露数据缺失而导致的偏差。考虑了11种缺失数据情况:一种是完全随机缺失(MCAR),四种是随机缺失(MAR),以及六种不可忽略的缺失情况,涵盖了各种风险比、删失比例、缺失比例和样本量。当缺失为MCAR或仅依赖于暴露时,偏差可忽略不计(2%-3%),这与无缺失数据的完整数据集中的估计值与真实参数之间的差异相似。相比之下,当缺失依赖于结局或同时依赖于结局和暴露时,会出现较大偏差。对于风险比为3.5、样本量为400、删失比例为20%且缺失数据比例为40%的模型,风险比的相对偏差在7%至64%之间。我们观察到在各种缺失数据机制下,偏差的方向和大小存在重要差异。例如,在缺失与较长或较短随访相关的情况下,偏差显著不同,尽管这两种机制都是MAR。当缺失与较长随访相关时,风险比被低估(偏差较大),而当与较短随访相关时,风险比被高估(偏差较小)。如果已知缺失与较少观察到的结局或与结局和暴露都相关,CCA可能会导致无效推断,应考虑其他处理缺失数据的方法。