• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

评估稳定性并比较优化特征子集大小的特征选择器的输出。

Evaluating stability and comparing output of feature selectors that optimize feature subset cardinality.

机构信息

Department of Pattern Recognition, Institute of Information Theory and Automation of the Czech Academy of Sciences, Prague, Czech Republic.

出版信息

IEEE Trans Pattern Anal Mach Intell. 2010 Nov;32(11):1921-39. doi: 10.1109/TPAMI.2010.34.

DOI:10.1109/TPAMI.2010.34
PMID:20847385
Abstract

Stability (robustness) of feature selection methods is a topic of recent interest, yet often neglected importance, with direct impact on the reliability of machine learning systems. We investigate the problem of evaluating the stability of feature selection processes yielding subsets of varying size. We introduce several novel feature selection stability measures and adjust some existing measures in a unifying framework that offers broad insight into the stability problem. We study in detail the properties of considered measures and demonstrate on various examples what information about the feature selection process can be gained. We also introduce an alternative approach to feature selection evaluation in the form of measures that enable comparing the similarity of two feature selection processes. These measures enable comparing, e.g., the output of two feature selection methods or two runs of one method with different parameters. The information obtained using the considered stability and similarity measures is shown to be usable for assessing feature selection methods (or criteria) as such.

摘要

特征选择方法的稳定性(鲁棒性)是最近备受关注的一个话题,但往往被忽视其重要性,因为它直接影响到机器学习系统的可靠性。我们研究了评估产生不同大小子集的特征选择过程稳定性的问题。我们引入了几种新的特征选择稳定性度量方法,并在一个统一的框架中调整了一些现有的度量方法,从而为稳定性问题提供了广泛的见解。我们详细研究了所考虑的度量方法的性质,并在各种示例中演示了可以从特征选择过程中获得哪些信息。我们还引入了一种以能够比较两个特征选择过程相似性的度量方法的形式来替代特征选择评估的方法。这些度量方法可以用于比较例如两种特征选择方法的输出,或者具有不同参数的一种方法的两次运行。使用所考虑的稳定性和相似性度量方法获得的信息可用于评估特征选择方法(或标准)本身。

相似文献

1
Evaluating stability and comparing output of feature selectors that optimize feature subset cardinality.评估稳定性并比较优化特征子集大小的特征选择器的输出。
IEEE Trans Pattern Anal Mach Intell. 2010 Nov;32(11):1921-39. doi: 10.1109/TPAMI.2010.34.
2
A novel feature selection approach for biomedical data classification.一种用于生物医学数据分类的新特征选择方法。
J Biomed Inform. 2010 Feb;43(1):15-23. doi: 10.1016/j.jbi.2009.07.008. Epub 2009 Jul 30.
3
Feature selection based on mutual information and redundancy-synergy coefficient.基于互信息和冗余-协同系数的特征选择
J Zhejiang Univ Sci. 2004 Nov;5(11):1382-91. doi: 10.1631/jzus.2004.1382.
4
Fast branch & bound algorithms for optimal feature selection.用于最优特征选择的快速分支定界算法。
IEEE Trans Pattern Anal Mach Intell. 2004 Jul;26(7):900-12. doi: 10.1109/TPAMI.2004.28.
5
A study on feature analysis for musical instrument classification.一项关于乐器分类特征分析的研究。
IEEE Trans Syst Man Cybern B Cybern. 2008 Apr;38(2):429-38. doi: 10.1109/TSMCB.2007.913394.
6
Feature selection with kernel class separability.基于核类可分性的特征选择
IEEE Trans Pattern Anal Mach Intell. 2008 Sep;30(9):1534-46. doi: 10.1109/TPAMI.2007.70799.
7
Bayesian feature and model selection for Gaussian mixture models.高斯混合模型的贝叶斯特征与模型选择
IEEE Trans Pattern Anal Mach Intell. 2006 Jun;28(6):1013-8. doi: 10.1109/TPAMI.2006.111.
8
Simultaneous localized feature selection and model detection for gaussian mixtures.高斯混合模型的同步局部特征选择与模型检测
IEEE Trans Pattern Anal Mach Intell. 2009 May;31(5):953-60. doi: 10.1109/TPAMI.2008.261.
9
Simultaneous feature selection and clustering using mixture models.使用混合模型进行同步特征选择和聚类
IEEE Trans Pattern Anal Mach Intell. 2004 Sep;26(9):1154-66. doi: 10.1109/TPAMI.2004.71.
10
Unsupervised feature selection under perturbations: meeting the challenges of biological data.扰动下的无监督特征选择:应对生物数据的挑战
Bioinformatics. 2007 Dec 15;23(24):3343-9. doi: 10.1093/bioinformatics/btm528. Epub 2007 Nov 7.

引用本文的文献

1
Benchmarking ensemble machine learning algorithms for multi-class, multi-omics data integration in clinical outcome prediction.用于临床结局预测中多类别、多组学数据整合的基准集成机器学习算法
Brief Bioinform. 2025 Mar 4;26(2). doi: 10.1093/bib/bbaf116.
2
Ensemble feature selection with data-driven thresholding for Alzheimer's disease biomarker discovery.基于数据驱动的阈值处理的集成特征选择在阿尔茨海默病生物标志物发现中的应用。
BMC Bioinformatics. 2023 Jan 9;24(1):9. doi: 10.1186/s12859-022-05132-9.
3
Evaluation of Feature Selection Techniques for Breast Cancer Risk Prediction.
乳腺癌风险预测特征选择技术评估。
Int J Environ Res Public Health. 2021 Oct 12;18(20):10670. doi: 10.3390/ijerph182010670.
4
A filter approach for feature selection in classification: application to automatic atrial fibrillation detection in electrocardiogram recordings.一种用于分类特征选择的滤波器方法:在心电图记录中自动检测心房颤动的应用。
BMC Med Inform Decis Mak. 2021 May 4;21(Suppl 4):130. doi: 10.1186/s12911-021-01427-8.
5
Feature Selection Stability and Accuracy of Prediction Models for Genomic Prediction of Residual Feed Intake in Pigs Using Machine Learning.使用机器学习对猪的剩余采食量进行基因组预测的预测模型的特征选择稳定性和准确性
Front Genet. 2021 Feb 22;12:611506. doi: 10.3389/fgene.2021.611506. eCollection 2021.
6
Machine Learning Prediction of Crossbred Pig Feed Efficiency and Growth Rate From Single Nucleotide Polymorphisms.基于单核苷酸多态性的杂交猪饲料效率和生长速率的机器学习预测
Front Genet. 2020 Dec 18;11:567818. doi: 10.3389/fgene.2020.567818. eCollection 2020.
7
NOREVA: enhanced normalization and evaluation of time-course and multi-class metabolomic data.NOREVA:时间进程和多类代谢组学数据的增强标准化和评估。
Nucleic Acids Res. 2020 Jul 2;48(W1):W436-W448. doi: 10.1093/nar/gkaa258.
8
Treatment Outcome Prediction for Cancer Patients based on Radiomics and Belief Function Theory.基于放射组学和信念函数理论的癌症患者治疗结果预测
IEEE Trans Radiat Plasma Med Sci. 2019 Mar;3(2):216-224. doi: 10.1109/TRPMS.2018.2872406. Epub 2018 Sep 27.
9
Robust clinical marker identification for diabetic kidney disease with ensemble feature selection.基于集成特征选择的糖尿病肾病稳健临床标志物识别。
J Am Med Inform Assoc. 2019 Mar 1;26(3):242-253. doi: 10.1093/jamia/ocy165.
10
A Multicriteria Approach to Find Predictive and Sparse Models with Stable Feature Selection for High-Dimensional Data.一种用于为高维数据寻找具有稳定特征选择的预测性稀疏模型的多准则方法。
Comput Math Methods Med. 2017;2017:7907163. doi: 10.1155/2017/7907163. Epub 2017 Aug 1.