Bastien B, Boukhobza T, Dumond H, Gégout-Petit A, Muller-Gueudin A, Thiébaut C
Transgene S.A., Illkirch-Graffenstaden Cedex, France.
Université de Lorraine, CNRS, CRAN, Nancy, France.
J Appl Stat. 2020 Oct 27;49(3):764-781. doi: 10.1080/02664763.2020.1837083. eCollection 2022.
We propose a new methodology for selecting and ranking covariates associated with a variable of interest in a context of high-dimensional data under dependence but few observations. The methodology successively intertwines the clustering of covariates, decorrelation of covariates using Factor Latent Analysis, selection using aggregation of adapted methods and finally ranking. A simulation study shows the interest of the decorrelation inside the different clusters of covariates. We first apply our method to transcriptomic data of 37 patients with advanced non-small-cell lung cancer who have received chemotherapy, to select the transcriptomic covariates that explain the survival outcome of the treatment. Secondly, we apply our method to 79 breast tumor samples to define patient profiles for a new metastatic biomarker and associated gene network in order to personalize the treatments.
我们提出了一种新方法,用于在数据依赖但观测值较少的高维数据环境中,选择与感兴趣变量相关的协变量并对其进行排序。该方法依次将协变量聚类、使用因子潜在分析对协变量进行去相关、通过适配方法的聚合进行选择并最终进行排序。一项模拟研究表明了在不同协变量簇内进行去相关的意义。我们首先将我们的方法应用于37例接受化疗的晚期非小细胞肺癌患者的转录组数据,以选择解释治疗生存结果的转录组协变量。其次,我们将我们的方法应用于79个乳腺肿瘤样本,以定义一种新的转移生物标志物和相关基因网络的患者特征,从而实现个性化治疗。