Kim Sangjin, Halabi Susan
Department of Biostatistics and Bioinformatics, Duke University Medical Center, Box 2717, Durham, NC 27710, USA.
Biomed Res Int. 2016;2016:8209453. doi: 10.1155/2016/8209453. Epub 2016 Aug 15.
Background. The iterative sure independence screening (ISIS) is a popular method in selecting important variables while maintaining most of the informative variables relevant to the outcome in high throughput data. However, it not only is computationally intensive but also may cause high false discovery rate (FDR). We propose to use the FDR as a screening method to reduce the high dimension to a lower dimension as well as controlling the FDR with three popular variable selection methods: LASSO, SCAD, and MCP. Method. The three methods with the proposed screenings were applied to prostate cancer data with presence of metastasis as the outcome. Results. Simulations showed that the three variable selection methods with the proposed screenings controlled the predefined FDR and produced high area under the receiver operating characteristic curve (AUROC) scores. In applying these methods to the prostate cancer example, LASSO and MCP selected 12 and 8 genes and produced AUROC scores of 0.746 and 0.764, respectively. Conclusions. We demonstrated that the variable selection methods with the sequential use of FDR and ISIS not only controlled the predefined FDR in the final models but also had relatively high AUROC scores.
背景。迭代确定独立筛选(ISIS)是一种在高通量数据中选择重要变量的常用方法,同时能保留与结果相关的大多数信息变量。然而,它不仅计算量很大,而且可能导致高错误发现率(FDR)。我们建议使用FDR作为一种筛选方法,将高维数据降维,同时使用三种常用的变量选择方法:套索(LASSO)、平滑剪切绝对偏差(SCAD)和最小角回归(MCP)来控制FDR。方法。将这三种带有提议筛选方法的方法应用于以是否存在转移为结果的前列腺癌数据。结果。模拟表明,这三种带有提议筛选方法的变量选择方法控制了预定义的FDR,并产生了较高的受试者工作特征曲线下面积(AUROC)分数。在将这些方法应用于前列腺癌实例时,LASSO和MCP分别选择了12个和8个基因,AUROC分数分别为0.746和0.764。结论。我们证明,依次使用FDR和ISIS的变量选择方法不仅在最终模型中控制了预定义的FDR,而且具有相对较高的AUROC分数。