• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

通过采用平衡策略进行特征选择,可以提高随机森林性能的准确性。

The accuracy of Random Forest performance can be improved by conducting a feature selection with a balancing strategy.

作者信息

Prasetiyowati Maria Irmina, Maulidevi Nur Ulfa, Surendro Kridanto

机构信息

Doctoral Program of Electrical Engineering and Informatics, School of Electrical Engineering and Informatics, Institut Teknologi Bandung, Bandung, Jawa Barat, Indonesia.

Department of Electrical Engineering and Informatics, School of Electrical Engineering and Informatics, Institut Teknologi Bandung, Bandung, Jawa Barat, Indonesia.

出版信息

PeerJ Comput Sci. 2022 Jul 14;8:e1041. doi: 10.7717/peerj-cs.1041. eCollection 2022.

DOI:10.7717/peerj-cs.1041
PMID:35875646
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC9299283/
Abstract

One of the significant purposes of building a model is to increase its accuracy within a shorter timeframe through the feature selection process. It is carried out by determining the importance of available features in a dataset using Information Gain (IG). The process is used to calculate the amounts of information contained in features with high values selected to accelerate the performance of an algorithm. In selecting informative features, a threshold value (cut-off) is used by the Information Gain (IG). Therefore, this research aims to determine the time and accuracy-performance needed to improve feature selection by integrating IG, the Fast Fourier Transform (FFT), and Synthetic Minor Oversampling Technique (SMOTE) methods. The feature selection model is then applied to the Random Forest, a tree-based machine learning algorithm with random feature selection. A total of eight datasets consisting of three balanced and five imbalanced datasets were used to conduct this research. Furthermore, the SMOTE found in the imbalance dataset was used to balance the data. The result showed that the feature selection using Information Gain, FFT, and SMOTE improved the performance accuracy of Random Forest.

摘要

构建模型的一个重要目的是通过特征选择过程在更短的时间内提高其准确性。这是通过使用信息增益(IG)确定数据集中可用特征的重要性来实现的。该过程用于计算所选高值特征中包含的信息量,以加速算法的性能。在选择信息性特征时,信息增益(IG)使用一个阈值(截止值)。因此,本研究旨在通过整合IG、快速傅里叶变换(FFT)和合成少数过采样技术(SMOTE)方法来确定改进特征选择所需的时间和准确性性能。然后将特征选择模型应用于随机森林,这是一种具有随机特征选择的基于树的机器学习算法。总共使用了八个数据集,其中包括三个平衡数据集和五个不平衡数据集来进行这项研究。此外,在不平衡数据集中发现的SMOTE用于平衡数据。结果表明,使用信息增益、FFT和SMOTE进行特征选择提高了随机森林的性能准确性。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b202/9299283/39ba6c0ca91c/peerj-cs-08-1041-g005.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b202/9299283/19222efc4676/peerj-cs-08-1041-g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b202/9299283/fce6ab9ef277/peerj-cs-08-1041-g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b202/9299283/29aecdb69eb2/peerj-cs-08-1041-g003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b202/9299283/2ebb720b1fa4/peerj-cs-08-1041-g004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b202/9299283/39ba6c0ca91c/peerj-cs-08-1041-g005.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b202/9299283/19222efc4676/peerj-cs-08-1041-g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b202/9299283/fce6ab9ef277/peerj-cs-08-1041-g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b202/9299283/29aecdb69eb2/peerj-cs-08-1041-g003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b202/9299283/2ebb720b1fa4/peerj-cs-08-1041-g004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b202/9299283/39ba6c0ca91c/peerj-cs-08-1041-g005.jpg

相似文献

1
The accuracy of Random Forest performance can be improved by conducting a feature selection with a balancing strategy.通过采用平衡策略进行特征选择,可以提高随机森林性能的准确性。
PeerJ Comput Sci. 2022 Jul 14;8:e1041. doi: 10.7717/peerj-cs.1041. eCollection 2022.
2
Clinical data classification using an enhanced SMOTE and chaotic evolutionary feature selection.使用增强型SMOTE和混沌进化特征选择的临床数据分类
Comput Biol Med. 2020 Nov;126:103991. doi: 10.1016/j.compbiomed.2020.103991. Epub 2020 Sep 18.
3
Hybrid model for precise hepatitis-C classification using improved random forest and SVM method.基于改进随机森林和 SVM 方法的精准丙型肝炎分类的混合模型。
Sci Rep. 2023 Aug 1;13(1):12473. doi: 10.1038/s41598-023-36605-3.
4
CURE-SMOTE algorithm and hybrid algorithm for feature selection and parameter optimization based on random forests.基于随机森林的用于特征选择和参数优化的CURE-SMOTE算法及混合算法。
BMC Bioinformatics. 2017 Mar 14;18(1):169. doi: 10.1186/s12859-017-1578-z.
5
Deep Learning Feature Extraction Approach for Hematopoietic Cancer Subtype Classification.深度学习特征提取方法在血液肿瘤亚型分类中的应用。
Int J Environ Res Public Health. 2021 Feb 23;18(4):2197. doi: 10.3390/ijerph18042197.
6
Structure-activity relationship-based chemical classification of highly imbalanced Tox21 datasets.基于结构-活性关系的高度不平衡Tox21数据集的化学分类
J Cheminform. 2020 Oct 27;12(1):66. doi: 10.1186/s13321-020-00468-x.
7
Wireless Sensor Networks Intrusion Detection Based on SMOTE and the Random Forest Algorithm.基于 SMOTE 和随机森林算法的无线传感器网络入侵检测。
Sensors (Basel). 2019 Jan 8;19(1):203. doi: 10.3390/s19010203.
8
Machine learning model for predicting malaria using clinical information.机器学习模型预测疟疾使用临床信息。
Comput Biol Med. 2021 Feb;129:104151. doi: 10.1016/j.compbiomed.2020.104151. Epub 2020 Nov 28.
9
A Synthetic Minority Oversampling Technique Based on Gaussian Mixture Model Filtering for Imbalanced Data Classification.一种基于高斯混合模型滤波的合成少数类过采样技术用于不平衡数据分类
IEEE Trans Neural Netw Learn Syst. 2024 Mar;35(3):3740-3753. doi: 10.1109/TNNLS.2022.3197156. Epub 2024 Feb 29.
10
Drug-Protein Interactions Prediction Models Using Feature Selection and Classification Techniques.基于特征选择和分类技术的药物-蛋白相互作用预测模型。
Curr Drug Metab. 2023;24(12):817-834. doi: 10.2174/0113892002268739231211063718.

引用本文的文献

1
Advancing accuracy in breath testing for lung cancer: strategies for improving diagnostic precision in imbalanced data.提高肺癌呼吸测试的准确性:改善不平衡数据中诊断精度的策略。
Respir Res. 2024 Jan 16;25(1):32. doi: 10.1186/s12931-024-02668-7.
2
Feature selection based on neighborhood rough sets and Gini index.基于邻域粗糙集和基尼指数的特征选择
PeerJ Comput Sci. 2023 Dec 12;9:e1711. doi: 10.7717/peerj-cs.1711. eCollection 2023.
3
Smart Flood Detection with AI and Blockchain Integration in Saudi Arabia Using Drones.沙特阿拉伯利用无人机实现人工智能与区块链集成的智能洪水检测。

本文引用的文献

1
Indications of nonlinear deterministic and finite-dimensional structures in time series of brain electrical activity: dependence on recording region and brain state.脑电活动时间序列中非线性确定性和有限维结构的指征:对记录区域和脑状态的依赖性。
Phys Rev E Stat Nonlin Soft Matter Phys. 2001 Dec;64(6 Pt 1):061907. doi: 10.1103/PhysRevE.64.061907. Epub 2001 Nov 20.
Sensors (Basel). 2023 May 28;23(11):5148. doi: 10.3390/s23115148.