Suppr超能文献

用于基于 - 度量进行特征选择的多变量滤波方法。

Multivariate filter methods for feature selection with the -metric.

作者信息

Ngo Nicolas, Michel Pierre, Giorgi Roch

机构信息

Aix Marseille Univ, Inserm, IRD, SESSTIM, Sciences Économiques & Sociales de la Santé & Traitement de l'Information Médicale, ISSPAM, Marseille, France.

Aix Marseille Univ, CNRS, AMSE, Aix-Marseille School of Economics, Marseille, France.

出版信息

BMC Med Res Methodol. 2024 Dec 19;24(1):307. doi: 10.1186/s12874-024-02426-9.

Abstract

BACKGROUND

The -metric value is generally used as the importance score of a feature (or a set of features) in a classification context. This study aimed to go further by creating a new methodology for multivariate feature selection for classification, whereby the -metric is associated with a specific search direction (and therefore a specific stopping criterion). As three search directions are used, we effectively created three distinct methods.

METHODS

We assessed the performance of our new methodology through a simulation study, comparing them against more conventional methods. Classification performance indicators, number of selected features, stability and execution time were used to evaluate the performance of the methods. We also evaluated how well the proposed methodology selected relevant features for the detection of atrial fibrillation, which is a cardiac arrhythmia.

RESULTS

We found that in the simulation study as well as the detection of AF task, our methods were able to select informative features and maintain a good level of predictive performance; however in a case of strong correlation and large datasets, the -metric based methods were less efficient to exclude non-informative features.

CONCLUSIONS

Results highlighted a good combination of both the forward search direction and the -metric as an evaluation function. However, using the backward search direction, the feature selection algorithm could fall into a local optima and can be improved.

摘要

背景

在分类背景下,-度量值通常用作特征(或一组特征)的重要性得分。本研究旨在通过创建一种用于分类的多变量特征选择新方法进一步深入研究,其中-度量与特定搜索方向(因此也是特定停止标准)相关联。由于使用了三种搜索方向,我们有效地创建了三种不同的方法。

方法

我们通过模拟研究评估了新方法的性能,并将其与更传统的方法进行比较。使用分类性能指标、所选特征数量、稳定性和执行时间来评估这些方法的性能。我们还评估了所提出的方法在检测心房颤动(一种心律失常)方面选择相关特征的效果如何。

结果

我们发现,在模拟研究以及房颤检测任务中,我们的方法能够选择信息丰富的特征并保持良好的预测性能水平;然而,在强相关性和大数据集的情况下,基于-度量的方法在排除非信息性特征方面效率较低。

结论

结果突出了前向搜索方向和作为评估函数的-度量的良好组合。然而,使用后向搜索方向时,特征选择算法可能会陷入局部最优,并且可以改进。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5320/11657396/386e1b7dc428/12874_2024_2426_Fig1_HTML.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验