• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

特征选择方法的性能。

Performance of feature selection methods.

机构信息

Department of Electrical and Computer Engineering, Texas A&M University, College Station, TX, USA.

出版信息

Curr Genomics. 2009 Sep;10(6):365-74. doi: 10.2174/138920209789177629.

DOI:10.2174/138920209789177629
PMID:20190952
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC2766788/
Abstract

High-throughput biological technologies offer the promise of finding feature sets to serve as biomarkers for medical applications; however, the sheer number of potential features (genes, proteins, etc.) means that there needs to be massive feature selection, far greater than that envisioned in the classical literature. This paper considers performance analysis for feature-selection algorithms from two fundamental perspectives: How does the classification accuracy achieved with a selected feature set compare to the accuracy when the best feature set is used and what is the optimal number of features that should be used? The criteria manifest themselves in several issues that need to be considered when examining the efficacy of a feature-selection algorithm: (1) the correlation between the classifier errors for the selected feature set and the theoretically best feature set; (2) the regressions of the aforementioned errors upon one another; (3) the peaking phenomenon, that is, the effect of sample size on feature selection; and (4) the analysis of feature selection in the framework of high-dimensional models corresponding to high-throughput data.

摘要

高通量生物技术有望找到可作为医学应用生物标志物的特征集;然而,潜在特征(基因、蛋白质等)的数量巨大,这意味着需要进行大规模的特征选择,远远超出经典文献中设想的范围。本文从两个基本角度考虑特征选择算法的性能分析:(1)所选特征集的分类准确性与使用最佳特征集时的准确性相比如何;(2)应该使用的最佳特征数是多少?标准体现在审查特征选择算法的功效时需要考虑的几个问题中:(1)所选特征集和理论上最佳特征集之间的分类器误差之间的相关性;(2)上述误差之间的回归;(3)峰值现象,即样本量对特征选择的影响;以及(4)在对应高通量数据的高维模型框架中进行特征选择的分析。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/362f/2766788/e357966c75d4/CG-10-365_F7.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/362f/2766788/533c9b52e24a/CG-10-365_F1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/362f/2766788/25398e4d152d/CG-10-365_F2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/362f/2766788/a36e3aa7342d/CG-10-365_F3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/362f/2766788/76c2ed58f008/CG-10-365_F4.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/362f/2766788/36de384d66f5/CG-10-365_F5.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/362f/2766788/1186212a01f8/CG-10-365_F6.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/362f/2766788/e357966c75d4/CG-10-365_F7.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/362f/2766788/533c9b52e24a/CG-10-365_F1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/362f/2766788/25398e4d152d/CG-10-365_F2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/362f/2766788/a36e3aa7342d/CG-10-365_F3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/362f/2766788/76c2ed58f008/CG-10-365_F4.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/362f/2766788/36de384d66f5/CG-10-365_F5.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/362f/2766788/1186212a01f8/CG-10-365_F6.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/362f/2766788/e357966c75d4/CG-10-365_F7.jpg

相似文献

1
Performance of feature selection methods.特征选择方法的性能。
Curr Genomics. 2009 Sep;10(6):365-74. doi: 10.2174/138920209789177629.
2
Feature selection and classifier performance in computer-aided diagnosis: the effect of finite sample size.计算机辅助诊断中的特征选择与分类器性能:有限样本量的影响。
Med Phys. 2000 Jul;27(7):1509-22. doi: 10.1118/1.599017.
3
Feature selection based on dependency margin.基于依存距离的特征选择。
IEEE Trans Cybern. 2015 Jun;45(6):1209-21. doi: 10.1109/TCYB.2014.2347372. Epub 2014 Sep 26.
4
A Feature and Algorithm Selection Method for Improving the Prediction of Protein Structural Class.一种用于改进蛋白质结构类预测的特征与算法选择方法
Comb Chem High Throughput Screen. 2017;20(7):612-621. doi: 10.2174/1386207320666170314103147.
5
Feature selection for elderly faller classification based on wearable sensors.基于可穿戴传感器的老年人跌倒者分类特征选择
J Neuroeng Rehabil. 2017 May 30;14(1):47. doi: 10.1186/s12984-017-0255-9.
6
Effect of finite sample size on feature selection and classification: a simulation study.有限样本大小对特征选择和分类的影响:一项模拟研究。
Med Phys. 2010 Feb;37(2):907-20. doi: 10.1118/1.3284974.
7
A wavelet-based optimal texture feature set for classification of brain tumours.一种基于小波的用于脑肿瘤分类的最优纹理特征集。
J Med Eng Technol. 2008 May-Jun;32(3):198-205. doi: 10.1080/03091900701455524.
8
A blocking strategy to improve gene selection for classification of gene expression data.一种用于改进基因选择以对基因表达数据进行分类的阻断策略。
IEEE/ACM Trans Comput Biol Bioinform. 2007 Apr-Jun;4(2):293-300. doi: 10.1109/TCBB.2007.1014.
9
Automated detection of bioimages using novel deep feature fusion algorithm and effective high-dimensional feature selection approach.利用新型深度特征融合算法和有效的高维特征选择方法对生物图像进行自动检测。
Comput Biol Med. 2021 Oct;137:104862. doi: 10.1016/j.compbiomed.2021.104862. Epub 2021 Sep 10.
10
Optimal reconstruction and quantitative image features for computer-aided diagnosis tools for breast CT.用于乳腺 CT 的计算机辅助诊断工具的最佳重建和定量图像特征。
Med Phys. 2017 May;44(5):1846-1856. doi: 10.1002/mp.12214. Epub 2017 Apr 13.

引用本文的文献

1
Machine learning predictive performance evaluation of conventional and fuzzy radiomics in clinical cancer imaging cohorts.机器学习在临床癌症成像队列中对常规和模糊放射组学的预测性能评估。
Eur J Nucl Med Mol Imaging. 2023 May;50(6):1607-1620. doi: 10.1007/s00259-023-06127-1. Epub 2023 Feb 4.
2
A Framework for Comparison and Assessment of Synthetic RNA-Seq Data.用于合成 RNA-Seq 数据比较和评估的框架。
Genes (Basel). 2022 Dec 14;13(12):2362. doi: 10.3390/genes13122362.
3
ShinyLearner: A containerized benchmarking tool for machine-learning classification of tabular data.

本文引用的文献

1
On the number of close-to-optimal feature sets.关于接近最优特征集的数量。
Cancer Inform. 2007 Feb 16;2:189-96.
2
On the epistemological crisis in genomics.论基因组学的认识论危机。
Curr Genomics. 2008 Apr;9(2):69-79. doi: 10.2174/138920208784139546.
3
Penalized feature selection and classification in bioinformatics.生物信息学中的惩罚特征选择与分类
ShinyLearner:一个用于表格数据机器学习分类的容器化基准测试工具。
Gigascience. 2020 Apr 1;9(4). doi: 10.1093/gigascience/giaa026.
4
Selecting Optimal Random Forest Predictive Models: A Case Study on Predicting the Spatial Distribution of Seabed Hardness.选择最优随机森林预测模型:以预测海底硬度空间分布为例
PLoS One. 2016 Feb 18;11(2):e0149089. doi: 10.1371/journal.pone.0149089. eCollection 2016.
5
Predictive Power Estimation Algorithm (PPEA)--a new algorithm to reduce overfitting for genomic biomarker discovery.预测能力估计算法(PPEA)——一种减少基因组生物标志物发现中过拟合的新算法。
PLoS One. 2011;6(9):e24233. doi: 10.1371/journal.pone.0024233. Epub 2011 Sep 15.
Brief Bioinform. 2008 Sep;9(5):392-403. doi: 10.1093/bib/bbn027. Epub 2008 Jun 18.
4
A review of feature selection techniques in bioinformatics.生物信息学中特征选择技术综述。
Bioinformatics. 2007 Oct 1;23(19):2507-17. doi: 10.1093/bioinformatics/btm344. Epub 2007 Aug 24.
5
Quantification of the impact of feature selection on the variance of cross-validation error estimation.特征选择对交叉验证误差估计方差影响的量化。
EURASIP J Bioinform Syst Biol. 2007;2007(1):16354. doi: 10.1155/2007/16354.
6
Comparison and evaluation of methods for generating differentially expressed gene lists from microarray data.从微阵列数据生成差异表达基因列表的方法的比较与评估
BMC Bioinformatics. 2006 Jul 26;7:359. doi: 10.1186/1471-2105-7-359.
7
What should be expected from feature selection in small-sample settings.在小样本情况下,特征选择应达到什么预期效果。
Bioinformatics. 2006 Oct 1;22(19):2430-6. doi: 10.1093/bioinformatics/btl407. Epub 2006 Jul 26.
8
Dimension reduction for classification with gene expression microarray data.利用基因表达微阵列数据进行分类的降维方法。
Stat Appl Genet Mol Biol. 2006;5:Article6. doi: 10.2202/1544-6115.1147. Epub 2006 Feb 24.
9
Genetic test bed for feature selection.用于特征选择的基因测试平台。
Bioinformatics. 2006 Apr 1;22(7):837-42. doi: 10.1093/bioinformatics/btl008. Epub 2006 Jan 20.
10
Prediction error estimation: a comparison of resampling methods.预测误差估计:重采样方法的比较
Bioinformatics. 2005 Aug 1;21(15):3301-7. doi: 10.1093/bioinformatics/bti499. Epub 2005 May 19.