Suppr超能文献

使用R包进行特征选择。

Feature selection with the R package .

作者信息

Tsagris Michail, Tsamardinos Ioannis

机构信息

Department of Economics, University of Crete, Rethymnon, 74100, Greece.

Department of Computer Science, University of Crete, Heraklion, Crete, 70013, Greece.

出版信息

F1000Res. 2018 Sep 20;7:1505. doi: 10.12688/f1000research.16216.2. eCollection 2018.

Abstract

Feature (or variable) selection is the process of identifying the minimal set of features with the highest predictive performance on the target variable of interest. Numerous feature selection algorithms have been developed over the years, but only few have been implemented in R and made publicly available R as packages while offering few options. The R package offers a variety of feature selection algorithms, and has unique features that make it advantageous over its competitors: a) it contains feature selection algorithms that can treat numerous types of target variables, including continuous, percentages, time to event (survival), binary, nominal, ordinal, clustered, counts, left censored, etc; b) it contains a variety of regression models that can be plugged into the feature selection algorithms (for example with time to event data the user can choose among Cox, Weibull, log logistic or exponential regression); c) it includes an algorithm for detecting multiple solutions (many sets of statistically equivalent features, plain speaking, two features can carry statistically equivalent information when substituting one with the other does not effect the inference or the conclusions); and d) it includes memory efficient algorithms for high volume data, data that cannot be loaded into R (In a 16GB RAM terminal for example, R cannot directly load data of 16GB size. By utilizing the proper package, we load the data and then perform feature selection.). In this paper, we qualitatively compare with other relevant feature selection packages and discuss its advantages and disadvantages. Further, we provide a demonstration of 's algorithms using real high-dimensional data from various applications.

摘要

特征(或变量)选择是识别对目标变量具有最高预测性能的最小特征集的过程。多年来已经开发了许多特征选择算法,但只有少数在R中实现并作为包公开提供,而且选项很少。R包提供了各种特征选择算法,并且具有使其优于竞争对手的独特功能:a)它包含可以处理多种类型目标变量的特征选择算法,包括连续型、百分比型、事件发生时间(生存)、二元型、名义型、有序型、聚类型、计数型、左删失型等;b)它包含各种可以插入特征选择算法的回归模型(例如对于事件发生时间数据,用户可以在Cox、Weibull、对数逻辑或指数回归中进行选择);c)它包括一种用于检测多个解的算法(许多组统计上等效的特征,简单地说,当用一个特征替换另一个特征不影响推断或结论时,两个特征可以携带统计上等效的信息);d)它包括用于大容量数据的内存高效算法,即无法加载到R中的数据(例如在一个16GB内存的终端中,R无法直接加载16GB大小的数据。通过使用适当的包,我们加载数据然后进行特征选择)。在本文中,我们定性地将[该包]与其他相关特征选择包进行比较,并讨论其优缺点。此外,我们使用来自各种应用的真实高维数据对[该包]的算法进行演示。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ac45/6793434/184be444ecf5/f1000research-7-22822-g0000.jpg

相似文献

1
Feature selection with the R package .
F1000Res. 2018 Sep 20;7:1505. doi: 10.12688/f1000research.16216.2. eCollection 2018.
3
The γ-OMP Algorithm for Feature Selection With Application to Gene Expression Data.
IEEE/ACM Trans Comput Biol Bioinform. 2022 Mar-Apr;19(2):1214-1224. doi: 10.1109/TCBB.2020.3029952. Epub 2022 Apr 1.
4
Deviance residuals-based sparse PLS and sparse kernel PLS regression for censored data.
Bioinformatics. 2015 Feb 1;31(3):397-404. doi: 10.1093/bioinformatics/btu660. Epub 2014 Oct 6.
7
EPX: An R package for the ensemble of subsets of variables for highly unbalanced binary classification.
Comput Biol Med. 2021 Sep;136:104760. doi: 10.1016/j.compbiomed.2021.104760. Epub 2021 Aug 13.
8
Artificial Intelligence based wrapper for high dimensional feature selection.
BMC Bioinformatics. 2023 Oct 18;24(1):392. doi: 10.1186/s12859-023-05502-x.
10
Ensemble feature selection with data-driven thresholding for Alzheimer's disease biomarker discovery.
BMC Bioinformatics. 2023 Jan 9;24(1):9. doi: 10.1186/s12859-022-05132-9.

引用本文的文献

1
Exploring T-cell metabolism in tuberculosis: development of a diagnostic model using metabolic genes.
Eur J Med Res. 2025 Jun 16;30(1):483. doi: 10.1186/s40001-025-02768-0.
3
Exploration and Enrichment Analysis of the QTLome for Important Traits in Livestock Species.
Genes (Basel). 2024 Nov 26;15(12):1513. doi: 10.3390/genes15121513.
4
Investigating the prognostic utility of promoter methylation in prostate cancer.
BJUI Compass. 2024 Oct 30;5(12):1299-1306. doi: 10.1002/bco2.445. eCollection 2024 Dec.
5
Prediction of COVID-19 in-hospital mortality in older patients using artificial intelligence: a multicenter study.
Front Aging. 2024 Oct 17;5:1473632. doi: 10.3389/fragi.2024.1473632. eCollection 2024.
10
Current Achievements and Applications of Transcriptomics in Personalized Cancer Medicine.
Int J Mol Sci. 2021 Jan 31;22(3):1422. doi: 10.3390/ijms22031422.

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验