Suppr超能文献

基于微阵列数据的递归基因选择方法分析。

Analysis of recursive gene selection approaches from microarray data.

作者信息

Li Fan, Yang Yiming

机构信息

Language Technology Institute 4502 NSH Carnegie Mellon University 5000 Forbes Avenue, Pittsburgh, PA 15213, USA.

出版信息

Bioinformatics. 2005 Oct 1;21(19):3741-7. doi: 10.1093/bioinformatics/bti618. Epub 2005 Aug 23.

Abstract

MOTIVATION

Finding a small subset of most predictive genes from microarray for disease prediction is a challenging problem. Support vector machines (SVMs) have been found to be successful with a recursive procedure in selecting important genes for cancer prediction. However, it is not well understood how much of the success depends on the choice of the specific classifier and how much on the recursive procedure. We answer this question by examining multiple classifers [SVM, ridge regression (RR) and Rocchio] with feature selection in recursive and non-recursive settings on three DNA microarray datasets (ALL-AML Leukemia data, Breast Cancer data and GCM data).

RESULTS

We found recursive RR most effective. On the AML-ALL dataset, it achieved zero error rate on the test set using only three genes (selected from over 7000), which is more encouraging than the best published result (zero error rate using 8 genes by recursive SVM). On the Breast Cancer dataset and the two largest categories of the GCM dataset, the results achieved by recursive RR are also very encouraging. A further analysis of the experimental results shows that different classifiers penalize redundant features to different extent and this property plays an important role in the recursive feature selection process. RR classifier tends to penalize redundant features to a much larger extent than the SVM does. This may be the reason why recursive RR has a better performance in selecting genes.

摘要

动机

从微阵列中寻找一小部分最具预测性的基因用于疾病预测是一个具有挑战性的问题。支持向量机(SVM)已被发现在通过递归程序选择用于癌症预测的重要基因方面取得了成功。然而,人们并不清楚这种成功在多大程度上取决于特定分类器的选择,以及在多大程度上取决于递归程序。我们通过在三个DNA微阵列数据集(ALL-AML白血病数据、乳腺癌数据和GCM数据)上,在递归和非递归设置下使用特征选择来检验多个分类器(SVM、岭回归(RR)和罗基奥算法)来回答这个问题。

结果

我们发现递归RR最为有效。在AML-ALL数据集上,它仅使用三个基因(从7000多个基因中选出)就在测试集上实现了零错误率,这比已发表的最佳结果(递归SVM使用8个基因实现零错误率)更令人鼓舞。在乳腺癌数据集和GCM数据集的两个最大类别上,递归RR取得的结果也非常令人鼓舞。对实验结果的进一步分析表明,不同的分类器对冗余特征的惩罚程度不同,并且这一特性在递归特征选择过程中起着重要作用。RR分类器对冗余特征的惩罚程度往往比SVM大得多。这可能就是递归RR在选择基因方面具有更好性能的原因。

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验