Xu Wenlong, Wang Minghui, Zhang Xianghua, Wang Lirong, Feng Huanqing
Department of Electronic Science and Technology, University of Science and Technology of China, Hefei 230027, China.
Bioinformation. 2008 Apr 11;2(7):301-3. doi: 10.6026/97320630002301.
Gene selection is to detect the most significantly expressed genes under different conditions expression data. The current challenge in gene selection is the comparison of a large number of genes with limited patient samples. Thus it is trivial task in simple statistical analysis. Various statistical measurements are adopted by filter methods applied in gene selection studies. Their ability to discriminate phenotypes is crucial in classification and selection. Here we describe the standard deviation error distribution (SDED) method for gene selection. It utilizes variations within-class and among-class in gene expression data. We tested the method using 4 leukemia datasets available in the public domain. The method was compared with the GS2 and CHO methods. The Prediction accuracies by SDED are better than both GS2 and CHO for different datasets. These are 0.8-4.2% and 1.6-8.4% more that in GS2 and CHO. The related OMIM annotations and KEGG pathways analyses verified that SDED can pick out more 4.0% and 6.1% genes with biological significance than GS2 and CHO, respectively.
基因选择是指在不同条件下的表达数据中检测出表达最为显著的基因。当前基因选择面临的挑战在于,要在患者样本有限的情况下对大量基因进行比较。因此,这在简单的统计分析中是一项艰巨的任务。基因选择研究中应用的过滤方法采用了各种统计量度。它们区分表型的能力在分类和选择中至关重要。在此,我们描述一种用于基因选择的标准差误差分布(SDED)方法。该方法利用基因表达数据中的类内和类间变异。我们使用公开领域中可用的4个白血病数据集对该方法进行了测试。将该方法与GS2和CHO方法进行了比较。对于不同的数据集,SDED的预测准确率均优于GS2和CHO。分别比GS2和CHO高出0.8 - 4.2%和1.6 - 8.4%。相关的在线孟德尔遗传(OMIM)注释和京都基因与基因组百科全书(KEGG)通路分析证实,SDED分别比GS2和CHO能多挑选出4.0%和6.1%具有生物学意义的基因。