Suppr超能文献

特征选择方法的性能。

Performance of feature selection methods.

机构信息

Department of Electrical and Computer Engineering, Texas A&M University, College Station, TX, USA.

出版信息

Curr Genomics. 2009 Sep;10(6):365-74. doi: 10.2174/138920209789177629.

Abstract

High-throughput biological technologies offer the promise of finding feature sets to serve as biomarkers for medical applications; however, the sheer number of potential features (genes, proteins, etc.) means that there needs to be massive feature selection, far greater than that envisioned in the classical literature. This paper considers performance analysis for feature-selection algorithms from two fundamental perspectives: How does the classification accuracy achieved with a selected feature set compare to the accuracy when the best feature set is used and what is the optimal number of features that should be used? The criteria manifest themselves in several issues that need to be considered when examining the efficacy of a feature-selection algorithm: (1) the correlation between the classifier errors for the selected feature set and the theoretically best feature set; (2) the regressions of the aforementioned errors upon one another; (3) the peaking phenomenon, that is, the effect of sample size on feature selection; and (4) the analysis of feature selection in the framework of high-dimensional models corresponding to high-throughput data.

摘要

高通量生物技术有望找到可作为医学应用生物标志物的特征集;然而,潜在特征(基因、蛋白质等)的数量巨大,这意味着需要进行大规模的特征选择,远远超出经典文献中设想的范围。本文从两个基本角度考虑特征选择算法的性能分析:(1)所选特征集的分类准确性与使用最佳特征集时的准确性相比如何;(2)应该使用的最佳特征数是多少?标准体现在审查特征选择算法的功效时需要考虑的几个问题中:(1)所选特征集和理论上最佳特征集之间的分类器误差之间的相关性;(2)上述误差之间的回归;(3)峰值现象,即样本量对特征选择的影响;以及(4)在对应高通量数据的高维模型框架中进行特征选择的分析。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/362f/2766788/533c9b52e24a/CG-10-365_F1.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验