Suppr超能文献

基于决策桩连接的特征选择与微阵列数据分析学习

Feature Selection with Conjunctions of Decision Stumps and Learning from Microarray Data.

出版信息

IEEE Trans Pattern Anal Mach Intell. 2012 Jan;34(1):174-86. doi: 10.1109/TPAMI.2011.82. Epub 2011 May 12.

Abstract

One of the objectives of designing feature selection learning algorithms is to obtain classifiers that depend on a small number of attributes and have verifiable future performance guarantees. There are few, if any, approaches that successfully address the two goals simultaneously. To the best of our knowledge, such algorithms that give theoretical bounds on the future performance have not been proposed so far in the context of the classification of gene expression data. In this work, we investigate the premise of learning a conjunction (or disjunction) of decision stumps in Occam's Razor, Sample Compression, and PAC-Bayes learning settings for identifying a small subset of attributes that can be used to perform reliable classification tasks. We apply the proposed approaches for gene identification from DNA microarray data and compare our results to those of the well-known successful approaches proposed for the task. We show that our algorithm not only finds hypotheses with a much smaller number of genes while giving competitive classification accuracy but also having tight risk guarantees on future performance, unlike other approaches. The proposed approaches are general and extensible in terms of both designing novel algorithms and application to other domains.

摘要

设计特征选择学习算法的目标之一是获得依赖于少量属性且具有可验证的未来性能保证的分类器。很少有(如果有的话)方法能够同时成功地解决这两个目标。据我们所知,在基因表达数据分类的背景下,到目前为止,还没有提出在未来性能上给出理论界的此类算法。在这项工作中,我们研究了在奥卡姆剃刀、样本压缩和 PAC-Bayes 学习环境中学习决策树合取(或析取)的前提,以识别一小部分可用于执行可靠分类任务的属性。我们将提出的方法应用于 DNA 微阵列数据中的基因识别,并将我们的结果与为该任务提出的知名成功方法的结果进行比较。我们表明,与其他方法不同,我们的算法不仅可以找到基因数量少得多的假设,同时还具有有竞争力的分类准确性,而且对未来性能有严格的风险保证。所提出的方法在设计新算法和应用于其他领域方面都是通用和可扩展的。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验