Morris Tim P, White Ian R, Royston Patrick, Seaman Shaun R, Wood Angela M
Hub for Trials Methodology Research, MRC Clinical Trials Unit, Aviation House, 125 Kingsway, London WC2B 6NH, U.K.; MRC Biostatistics Unit, Institute of Public Health, Robinson Way, Cambridge CB2 0SR, U.K.
Stat Med. 2014 Jan 15;33(1):88-104. doi: 10.1002/sim.5935. Epub 2013 Aug 6.
We are concerned with multiple imputation of the ratio of two variables, which is to be used as a covariate in a regression analysis. If the numerator and denominator are not missing simultaneously, it seems sensible to make use of the observed variable in the imputation model. One such strategy is to impute missing values for the numerator and denominator, or the log-transformed numerator and denominator, and then calculate the ratio of interest; we call this 'passive' imputation. Alternatively, missing ratio values might be imputed directly, with or without the numerator and/or the denominator in the imputation model; we call this 'active' imputation. In two motivating datasets, one involving body mass index as a covariate and the other involving the ratio of total to high-density lipoprotein cholesterol, we assess the sensitivity of results to the choice of imputation model and, as an alternative, explore fully Bayesian joint models for the outcome and incomplete ratio. Fully Bayesian approaches using Winbugs were unusable in both datasets because of computational problems. In our first dataset, multiple imputation results are similar regardless of the imputation model; in the second, results are sensitive to the choice of imputation model. Sensitivity depends strongly on the coefficient of variation of the ratio's denominator. A simulation study demonstrates that passive imputation without transformation is risky because it can lead to downward bias when the coefficient of variation of the ratio's denominator is larger than about 0.1. Active imputation or passive imputation after log-transformation is preferable.
我们关注的是两个变量之比的多重插补,该比值将用作回归分析中的协变量。如果分子和分母不同时缺失,在插补模型中利用观测变量似乎是合理的。一种这样的策略是对分子和分母,或对数变换后的分子和分母进行缺失值插补,然后计算感兴趣的比值;我们将此称为“被动”插补。或者,缺失的比值值可以直接插补,插补模型中可以包含或不包含分子和/或分母;我们将此称为“主动”插补。在两个激发性数据集中,一个涉及体重指数作为协变量,另一个涉及总胆固醇与高密度脂蛋白胆固醇之比,我们评估结果对插补模型选择的敏感性,并作为一种替代方法,探索针对结果和不完整比值的全贝叶斯联合模型。由于计算问题,在两个数据集中使用Winbugs的全贝叶斯方法均不可行。在我们的第一个数据集中,无论插补模型如何,多重插补结果都相似;在第二个数据集中,结果对插补模型的选择敏感。敏感性在很大程度上取决于比值分母的变异系数。一项模拟研究表明,未经变换的被动插补存在风险,因为当比值分母的变异系数大于约0.1时,它可能导致向下偏差。对数变换后的主动插补或被动插补更可取。