Department of Mathematical Sciences, University of Essex, Wivenhoe Park, CO4 3SQ Colchester, UK.
BMC Bioinformatics. 2014 Aug 11;15(1):274. doi: 10.1186/1471-2105-15-274.
Microarray technology, as well as other functional genomics experiments, allow simultaneous measurements of thousands of genes within each sample. Both the prediction accuracy and interpretability of a classifier could be enhanced by performing the classification based only on selected discriminative genes. We propose a statistical method for selecting genes based on overlapping analysis of expression data across classes. This method results in a novel measure, called proportional overlapping score (POS), of a feature's relevance to a classification task.
We apply POS, along-with four widely used gene selection methods, to several benchmark gene expression datasets. The experimental results of classification error rates computed using the Random Forest, k Nearest Neighbor and Support Vector Machine classifiers show that POS achieves a better performance.
A novel gene selection method, POS, is proposed. POS analyzes the expressions overlap across classes taking into account the proportions of overlapping samples. It robustly defines a mask for each gene that allows it to minimize the effect of expression outliers. The constructed masks along-with a novel gene score are exploited to produce the selected subset of genes.
微阵列技术以及其他功能基因组学实验允许在每个样本中同时测量数千个基因。通过仅基于选定的有区别的基因进行分类,可以提高分类器的预测准确性和可解释性。我们提出了一种基于跨类表达数据重叠分析的基因选择的统计方法。该方法产生了一种新的度量标准,称为比例重叠得分(POS),用于衡量特征与分类任务的相关性。
我们将 POS 与四种广泛使用的基因选择方法一起应用于几个基准基因表达数据集。使用随机森林、k 近邻和支持向量机分类器计算的分类错误率的实验结果表明,POS 实现了更好的性能。
提出了一种新的基因选择方法 POS。POS 分析了跨类的表达重叠,同时考虑了重叠样本的比例。它为每个基因稳健地定义了一个掩模,以最小化表达异常值的影响。所构建的掩模与新的基因得分一起用于生成选定的基因子集。