Research Center of Modernization of Traditional Chinese Medicines, College of Chemistry and Chemical Engineering, Central South University, Changsha 410083, PR China.
Analyst. 2011 Apr 7;136(7):1456-63. doi: 10.1039/c0an00667j. Epub 2011 Feb 14.
Selecting a small subset of informative genes plays an important role in accurate prediction of clinical tumor samples. Based on model population analysis, a novel variable selection method, called noise incorporated subwindow permutation analysis (NISPA), is proposed in this study to work with support vector machines (SVMs). The essence of NISPA lies in the point that one noise variable is added into each sampled sub-dataset and then the distribution of variable importance of the added noise could be computed and serves as the common reference to evaluate the experimental variables. Further, by using the non-parametric Mann-Whitney U test, a P value can be assigned to each variable which describes to what extent the distributions of the gene variable and the noise variable are different. According to the computed P values, all the variables could be ranked and then a small subset of informative variables could be determined to build the model. Moreover, by NISPA, we are the first to distinguish the variables into a more detailed classification as informative, uninformative (noise) and interfering variables in comparison with other methods. In this study, two microarray datasets are employed to evaluate the performance of NISPA. The results show that the prediction errors of SVM classifiers could be significantly reduced by variable selection using NISPA. It is concluded that NISPA is a good alternative of variable selection algorithm.
选择一小部分信息丰富的基因对于准确预测临床肿瘤样本起着重要作用。基于模型人群分析,本研究提出了一种新的变量选择方法,称为噪声集成子窗口置换分析(NISPA),与支持向量机(SVM)一起使用。NISPA 的本质在于向每个采样子数据集中添加一个噪声变量,然后可以计算添加噪声的变量重要性分布,并将其作为评估实验变量的共同参考。此外,通过使用非参数曼-惠特尼 U 检验,可以为每个变量分配一个 P 值,该 P 值描述了基因变量和噪声变量的分布差异程度。根据计算出的 P 值,可以对所有变量进行排序,然后确定一小部分信息丰富的变量来构建模型。此外,通过 NISPA,我们是第一个将变量细分为更详细的分类,与其他方法相比,将变量分为信息丰富、无信息(噪声)和干扰变量。在这项研究中,使用了两个微阵列数据集来评估 NISPA 的性能。结果表明,通过使用 NISPA 进行变量选择,可以显著降低 SVM 分类器的预测误差。因此,可以得出结论,NISPA 是一种很好的变量选择算法的替代方法。