Shardell Michelle, Hicks Gregory E
Department of Epidemiology and Public Health, University of Maryland, Baltimore, MD, U.S.A.
Stat Med. 2014 Nov 10;33(25):4437-52. doi: 10.1002/sim.6238. Epub 2014 Jun 17.
In studies of older adults, researchers often recruit proxy respondents, such as relatives or caregivers, when study participants cannot provide self-reports (e.g., because of illness). Proxies are usually only sought to report on behalf of participants with missing self-reports; thus, either a participant self-report or proxy report, but not both, is available for each participant. Furthermore, the missing-data mechanism for participant self-reports is not identifiable and may be nonignorable. When exposures are binary and participant self-reports are conceptualized as the gold standard, substituting error-prone proxy reports for missing participant self-reports may produce biased estimates of outcome means. Researchers can handle this data structure by treating the problem as one of misclassification within the stratum of participants with missing self-reports. Most methods for addressing exposure misclassification require validation data, replicate data, or an assumption of nondifferential misclassification; other methods may result in an exposure misclassification model that is incompatible with the analysis model. We propose a model that makes none of the aforementioned requirements and still preserves model compatibility. Two user-specified tuning parameters encode the exposure misclassification model. Two proposed approaches estimate outcome means standardized for (potentially) high-dimensional covariates using multiple imputation followed by propensity score methods. The first method is parametric and uses maximum likelihood to estimate the exposure misclassification model (i.e., the imputation model) and the propensity score model (i.e., the analysis model); the second method is nonparametric and uses boosted classification and regression trees to estimate both models. We apply both methods to a study of elderly hip fracture patients.
在针对老年人的研究中,当研究参与者无法提供自我报告(例如,由于疾病)时,研究人员通常会招募代理受访者,如亲属或护理人员。代理通常仅用于代表那些缺少自我报告的参与者进行报告;因此,每个参与者只有一份参与者自我报告或代理报告,而非两者都有。此外,参与者自我报告的缺失数据机制无法识别,且可能不可忽视。当暴露因素为二元变量且将参与者自我报告视为金标准时,用容易出错的代理报告替代缺失的参与者自我报告可能会产生有偏差的结局均值估计。研究人员可以将这个问题当作是在缺少自我报告的参与者分层中错误分类的问题来处理。大多数解决暴露错误分类的方法需要验证数据、重复数据或非差异错误分类的假设;其他方法可能会导致一个与分析模型不兼容的暴露错误分类模型。我们提出了一个不需要上述任何条件且仍能保持模型兼容性的模型。两个用户指定的调整参数对暴露错误分类模型进行编码。两种提议的方法使用多重填补,然后是倾向得分法,来估计针对(潜在)高维协变量标准化的结局均值。第一种方法是参数化的,使用最大似然法来估计暴露错误分类模型(即填补模型)和倾向得分模型(即分析模型);第二种方法是非参数化的,使用增强分类与回归树来估计这两个模型。我们将这两种方法应用于一项老年髋部骨折患者的研究。