Center for Applied Statistical Research, School of Mathematics, Jilin University, 2699 Qianjin Street, Changchun, Jilin, 130012, China.
Division of Clinical Research, First Hospital of Jilin University, 71 Xinmin Street, Changchun, Jilin, 130021, China.
Sci Rep. 2017 Apr 7;7:46164. doi: 10.1038/srep46164.
In contrast to feature selection and gene set analysis, bi-level selection is a process of selecting not only important gene sets but also important genes within those gene sets. Depending on the order of selections, a bi-level selection method can be classified into three categories - forward selection, which first selects relevant gene sets followed by the selection of relevant individual genes; backward selection which takes the reversed order; and simultaneous selection, which performs the two tasks simultaneously usually with the aids of a penalized regression model. To test the existence of subtype-specific prognostic genes for non-small cell lung cancer (NSCLC), we had previously proposed the Cox-filter method that examines the association between patients' survival time after diagnosis with one specific gene, the disease subtypes, and their interaction terms. In this study, we further extend it to carry out forward and backward bi-level selection. Using simulations and a NSCLC application, we demonstrate that the forward selection outperforms the backward selection and other relevant algorithms in our setting. Both proposed methods are readily understandable and interpretable. Therefore, they represent useful tools for the researchers who are interested in exploring the prognostic value of gene expression data for specific subtypes or stages of a disease.
与特征选择和基因集分析相反,双水平选择是一个不仅选择重要基因集,而且选择这些基因集中重要基因的过程。根据选择的顺序,双水平选择方法可以分为三类 - 前向选择,首先选择相关的基因集,然后选择相关的个体基因;后向选择则采用相反的顺序;以及同时选择,通常借助惩罚回归模型同时执行这两个任务。为了测试非小细胞肺癌 (NSCLC) 特定亚型的预后基因是否存在,我们之前提出了 Cox-filter 方法,该方法检查患者诊断后生存时间与特定基因、疾病亚型及其交互项之间的关联。在这项研究中,我们进一步将其扩展到进行前向和后向双水平选择。使用模拟和 NSCLC 应用程序,我们证明在我们的设置中,前向选择优于后向选择和其他相关算法。这两种方法都易于理解和解释。因此,它们是对探索特定亚型或疾病阶段的基因表达数据预后价值感兴趣的研究人员的有用工具。