Schumi Jennifer, DiRienzo A Gregory, DeGruttola Victor
Statistics Collaborative, Inc.
Int J Biostat. 2008 Sep 29;4(1):Article 18. doi: 10.2202/1557-4679.1102.
Understanding how long-term clinical outcomes relate to short-term response to therapy is an important topic of research with a variety of applications. In HIV, early measures of viral RNA levels are known to be a strong prognostic indicator of future viral load response. However, mutations observed in the high-dimensional viral genotype at an early time point may change this prognosis. Unfortunately, some subjects may not have a viral genetic sequence measured at the early time point, and the sequence may be missing for reasons related to the outcome. Complete-case analyses of missing data are generally biased when the assumption that data are missing completely at random is not met, and methods incorporating multiple imputation may not be well-suited for the analysis of high-dimensional data. We propose a semiparametric multiple testing approach to the problem of identifying associations between potentially missing high-dimensional covariates and response. Following the recent exposition by Tsiatis, unbiased nonparametric summary statistics are constructed by inversely weighting the complete cases according to the conditional probability of being observed, given data that is observed for each subject. Resulting summary statistics will be unbiased under the assumption of missing at random. We illustrate our approach through an application to data from a recent AIDS clinical trial, and demonstrate finite sample properties with simulations.
了解长期临床结果与治疗短期反应之间的关系是一个具有多种应用的重要研究课题。在艾滋病病毒(HIV)领域,已知病毒RNA水平的早期测量是未来病毒载量反应的有力预后指标。然而,在早期时间点高维病毒基因型中观察到的突变可能会改变这种预后。不幸的是,一些受试者在早期时间点可能没有测量病毒基因序列,并且该序列可能因与结果相关的原因而缺失。当数据完全随机缺失的假设不成立时,对缺失数据进行完整病例分析通常会产生偏差,并且包含多重填补的方法可能不太适合分析高维数据。我们提出了一种半参数多重检验方法,用于识别潜在缺失的高维协变量与反应之间的关联问题。继齐亚蒂斯(Tsiatis)最近的阐述之后,通过根据每个受试者观察到的数据对完整病例进行反向加权来构建无偏非参数汇总统计量。在随机缺失的假设下,所得汇总统计量将是无偏的。我们通过应用于最近一项艾滋病临床试验的数据来说明我们的方法,并通过模拟展示有限样本性质。