Bio-Computing Research Center, Shenzhen Graduate School, Harbin Institute of Technology, Shenzhen, Guangdong, China.
PLoS One. 2012;7(7):e38873. doi: 10.1371/journal.pone.0038873. Epub 2012 Jul 10.
Conventional gene selection methods based on principal component analysis (PCA) use only the first principal component (PC) of PCA or sparse PCA to select characteristic genes. These methods indeed assume that the first PC plays a dominant role in gene selection. However, in a number of cases this assumption is not satisfied, so the conventional PCA-based methods usually provide poor selection results. In order to improve the performance of the PCA-based gene selection method, we put forward the gene selection method via weighting PCs by singular values (WPCS). Because different PCs have different importance, the singular values are exploited as the weights to represent the influence on gene selection of different PCs. The ROC curves and AUC statistics on artificial data show that our method outperforms the state-of-the-art methods. Moreover, experimental results on real gene expression data sets show that our method can extract more characteristic genes in response to abiotic stresses than conventional gene selection methods.
基于主成分分析 (PCA) 的传统基因选择方法仅使用 PCA 的第一主成分 (PC) 或稀疏 PCA 来选择特征基因。这些方法确实假设第一主成分在基因选择中起主导作用。然而,在许多情况下,这种假设并不成立,因此传统的基于 PCA 的方法通常提供较差的选择结果。为了提高基于 PCA 的基因选择方法的性能,我们提出了通过奇异值对 PC 进行加权的基因选择方法 (WPCS)。由于不同的 PC 具有不同的重要性,因此奇异值被用作权重,以表示不同 PC 对基因选择的影响。在人工数据上的 ROC 曲线和 AUC 统计结果表明,我们的方法优于最先进的方法。此外,在真实基因表达数据集上的实验结果表明,与传统的基因选择方法相比,我们的方法可以提取更多响应非生物胁迫的特征基因。