• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

互信息在回归特征选择中是否足够?

Is mutual information adequate for feature selection in regression?

机构信息

Machine Learning Group - ICTEAM, Université catholique de Louvain, Place du Levant 3, 1348 Louvain-la-Neuve, Belgium.

出版信息

Neural Netw. 2013 Dec;48:1-7. doi: 10.1016/j.neunet.2013.07.003. Epub 2013 Jul 11.

DOI:10.1016/j.neunet.2013.07.003
PMID:23892907
Abstract

Feature selection is an important preprocessing step for many high-dimensional regression problems. One of the most common strategies is to select a relevant feature subset based on the mutual information criterion. However, no connection has been established yet between the use of mutual information and a regression error criterion in the machine learning literature. This is obviously an important lack, since minimising such a criterion is eventually the objective one is interested in. This paper demonstrates that under some reasonable assumptions, features selected with the mutual information criterion are the ones minimising the mean squared error and the mean absolute error. On the contrary, it is also shown that the mutual information criterion can fail in selecting optimal features in some situations that we characterise. The theoretical developments presented in this work are expected to lead in practice to a critical and efficient use of the mutual information for feature selection.

摘要

特征选择是许多高维回归问题的重要预处理步骤。最常见的策略之一是根据互信息准则选择相关的特征子集。然而,在机器学习文献中,还没有建立使用互信息和回归误差准则之间的联系。这显然是一个重要的缺陷,因为最小化这样的准则最终是人们感兴趣的目标。本文证明,在一些合理的假设下,基于互信息准则选择的特征是最小化均方误差和平均绝对误差的特征。相反,也表明在我们所描述的某些情况下,互信息准则可能无法选择最优特征。这项工作提出的理论发展有望在实践中导致对互信息进行关键和有效的特征选择。

相似文献

1
Is mutual information adequate for feature selection in regression?互信息在回归特征选择中是否足够?
Neural Netw. 2013 Dec;48:1-7. doi: 10.1016/j.neunet.2013.07.003. Epub 2013 Jul 11.
2
Partial distortion entropy maximization for online data clustering.用于在线数据聚类的部分失真熵最大化
Neural Netw. 2007 Sep;20(7):819-31. doi: 10.1016/j.neunet.2007.04.029. Epub 2007 Jul 6.
3
Feature selection based on mutual information and redundancy-synergy coefficient.基于互信息和冗余-协同系数的特征选择
J Zhejiang Univ Sci. 2004 Nov;5(11):1382-91. doi: 10.1631/jzus.2004.1382.
4
Dimensionality reduction for density ratio estimation in high-dimensional spaces.高维空间中密度比估计的降维方法。
Neural Netw. 2010 Jan;23(1):44-59. doi: 10.1016/j.neunet.2009.07.007. Epub 2009 Jul 18.
5
A general procedure to generate models for urban environmental-noise pollution using feature selection and machine learning methods.使用特征选择和机器学习方法生成城市环境噪声污染模型的一般步骤。
Sci Total Environ. 2015 Feb 1;505:680-93. doi: 10.1016/j.scitotenv.2014.08.060. Epub 2014 Oct 30.
6
Boosting feature selection for Neural Network based regression.
Neural Netw. 2009 Jul-Aug;22(5-6):748-56. doi: 10.1016/j.neunet.2009.06.039. Epub 2009 Jul 2.
7
Simultaneous feature selection and clustering using mixture models.使用混合模型进行同步特征选择和聚类
IEEE Trans Pattern Anal Mach Intell. 2004 Sep;26(9):1154-66. doi: 10.1109/TPAMI.2004.71.
8
Minimax mutual information approach for independent component analysis.用于独立成分分析的极小极大互信息方法。
Neural Comput. 2004 Jun;16(6):1235-52. doi: 10.1162/089976604773717595.
9
What should be expected from feature selection in small-sample settings.在小样本情况下,特征选择应达到什么预期效果。
Bioinformatics. 2006 Oct 1;22(19):2430-6. doi: 10.1093/bioinformatics/btl407. Epub 2006 Jul 26.
10
Feature selection with kernel class separability.基于核类可分性的特征选择
IEEE Trans Pattern Anal Mach Intell. 2008 Sep;30(9):1534-46. doi: 10.1109/TPAMI.2007.70799.

引用本文的文献

1
Prediction and Interpretability Study of the Glass Transition Temperature of Polyimide Based on Machine Learning and Molecular Dynamics Simulations.基于机器学习和分子动力学模拟的聚酰亚胺玻璃化转变温度预测与可解释性研究
Polymers (Basel). 2025 Jul 30;17(15):2083. doi: 10.3390/polym17152083.
2
Cost function for low-dimensional manifold topology assessment.用于低维流形拓扑评估的代价函数。
Sci Rep. 2022 Aug 25;12(1):14496. doi: 10.1038/s41598-022-18655-1.
3
Machine Learning-Based Radiomics for Prediction of Epidermal Growth Factor Receptor Mutations in Lung Adenocarcinoma.
基于机器学习的放射组学预测肺腺癌表皮生长因子受体突变。
Dis Markers. 2022 May 7;2022:2056837. doi: 10.1155/2022/2056837. eCollection 2022.
4
Predicting and Preventing Nocturnal Hypoglycemia in Type 1 Diabetes Using Big Data Analytics and Decision Theoretic Analysis.使用大数据分析和决策理论分析预测 1 型糖尿病夜间低血糖
Diabetes Technol Ther. 2020 Nov;22(11):801-811. doi: 10.1089/dia.2019.0458. Epub 2020 May 14.
5
An Improved Normalized Mutual Information Variable Selection Algorithm for Neural Network-Based Soft Sensors.基于神经网络的软传感器的改进归一化互信息变量选择算法。
Sensors (Basel). 2019 Dec 5;19(24):5368. doi: 10.3390/s19245368.
6
Scalable Nonparametric Prescreening Method for Searching Higher-Order Genetic Interactions Underlying Quantitative Traits.可扩展的非参数预筛选方法,用于搜索数量性状背后的高阶遗传相互作用。
Genetics. 2019 Dec;213(4):1209-1224. doi: 10.1534/genetics.119.302658. Epub 2019 Oct 4.