Chambaz Antoine, Neuvial Pierre, van der Laan Mark J
MAP5, Université Paris Descartes and CNRS.
Electron J Stat. 2012;6:1059-1099. doi: 10.1214/12-EJS703.
We define a new measure of variable importance of an exposure on a continuous outcome, accounting for potential confounders. The exposure features a reference level x(0) with positive mass and a continuum of other levels. For the purpose of estimating it, we fully develop the semi-parametric estimation methodology called targeted minimum loss estimation methodology (TMLE) [23, 22]. We cover the whole spectrum of its theoretical study (convergence of the iterative procedure which is at the core of the TMLE methodology; consistency and asymptotic normality of the estimator), practical implementation, simulation study and application to a genomic example that originally motivated this article. In the latter, the exposure X and response Y are, respectively, the DNA copy number and expression level of a given gene in a cancer cell. Here, the reference level is x(0) = 2, that is the expected DNA copy number in a normal cell. The confounder is a measure of the methylation of the gene. The fact that there is no clear biological indication that X and Y can be interpreted as an exposure and a response, respectively, is not problematic.
我们定义了一种新的衡量暴露因素对连续结局变量重要性的指标,同时考虑了潜在的混杂因素。该暴露因素具有一个具有正质量的参考水平x(0)以及一系列其他水平。为了对其进行估计,我们全面开发了一种称为靶向最小损失估计方法(TMLE)的半参数估计方法[23, 22]。我们涵盖了其理论研究的全范围(TMLE方法核心的迭代过程的收敛性;估计量的一致性和渐近正态性)、实际应用、模拟研究以及对最初激发本文的一个基因组实例的应用。在后者中,暴露因素X和反应Y分别是癌细胞中给定基因的DNA拷贝数和表达水平。这里,参考水平是x(0) = 2,即正常细胞中的预期DNA拷贝数。混杂因素是该基因的甲基化程度。X和Y分别被解释为暴露因素和反应这一事实缺乏明确的生物学依据,但这并无问题。