• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

通过稳健加权分数对高维二元类不平衡基因表达数据进行特征选择

Feature selection via robust weighted score for high dimensional binary class-imbalanced gene expression data.

作者信息

Khan Zardad, Ali Amjad, Aldahmani Saeed

机构信息

Department of Statistics and Business Analytics, United Arab Emirates University, Al Ain, United Arab Emirates.

出版信息

Heliyon. 2024 Sep 30;10(19):e38547. doi: 10.1016/j.heliyon.2024.e38547. eCollection 2024 Oct 15.

DOI:10.1016/j.heliyon.2024.e38547
PMID:39398002
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC11471177/
Abstract

In this paper, a robust weighted score for unbalanced data (ROWSU) is proposed for selecting the most discriminative features for high dimensional gene expression binary classification with class-imbalance problem. The method addresses one of the most challenging problems of highly skewed class distributions in gene expression datasets that adversely affect the performance of classification algorithms. First, the training dataset is balanced by synthetically generating data points from minority class observations. Second, a minimum subset of genes is selected using a greedy search approach. Third, a novel weighted robust score, where the weights are computed by support vectors, is introduced to obtain a refined set of genes. The highest scoring genes based on this approach are combined with the minimum subset of genes selected by the greedy search approach to form the final set of genes. The novel method ensures the selection of the most discriminative genes, even in the presence of skewed class distribution, thereby improving the performance of the classifiers. The performance of the proposed ROWSU method is evaluated on 7 gene expression datasets. Classification accuracy, sensitivity and F-score are used as performance metrics to compare the proposed ROWSU algorithm with several other state-of-the-art methods. Boxplots and stability plots are also constructed for a better understanding of the results. The results show that the proposed method outperforms the existing feature selection procedures based on classification performance from nearest neighbors (NN) and random forest (RF) classifiers.

摘要

本文提出了一种用于不平衡数据的稳健加权分数(ROWSU),用于为存在类别不平衡问题的高维基因表达二元分类选择最具判别力的特征。该方法解决了基因表达数据集中高度偏斜的类别分布这一最具挑战性的问题之一,这种分布会对分类算法的性能产生不利影响。首先,通过从少数类观测值中综合生成数据点来平衡训练数据集。其次,使用贪婪搜索方法选择基因的最小子集。第三,引入一种新颖的加权稳健分数,其权重由支持向量计算得出,以获得一组经过优化的基因。基于此方法得分最高的基因与通过贪婪搜索方法选择的基因最小子集相结合,形成最终的基因集。这种新方法确保即使在存在类别分布偏斜的情况下也能选择出最具判别力的基因,从而提高分类器的性能。在所提出的ROWSU方法在7个基因表达数据集上进行了评估。分类准确率、灵敏度和F分数被用作性能指标,以将所提出的ROWSU算法与其他几种先进方法进行比较。还构建了箱线图和稳定性图以更好地理解结果。结果表明,基于最近邻(NN)和随机森林(RF)分类器的分类性能,所提出的方法优于现有的特征选择程序。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5933/11471177/722d6b92ea08/gr006.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5933/11471177/4120f3949320/gr002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5933/11471177/d1f825ed20a3/gr001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5933/11471177/c604864f9905/gr003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5933/11471177/2f212ad180ea/gr004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5933/11471177/5633c24eb41b/gr005.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5933/11471177/722d6b92ea08/gr006.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5933/11471177/4120f3949320/gr002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5933/11471177/d1f825ed20a3/gr001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5933/11471177/c604864f9905/gr003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5933/11471177/2f212ad180ea/gr004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5933/11471177/5633c24eb41b/gr005.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5933/11471177/722d6b92ea08/gr006.jpg

相似文献

1
Feature selection via robust weighted score for high dimensional binary class-imbalanced gene expression data.通过稳健加权分数对高维二元类不平衡基因表达数据进行特征选择
Heliyon. 2024 Sep 30;10(19):e38547. doi: 10.1016/j.heliyon.2024.e38547. eCollection 2024 Oct 15.
2
Robust proportional overlapping analysis for feature selection in binary classification within functional genomic experiments.功能基因组实验中二元分类特征选择的稳健比例重叠分析
PeerJ Comput Sci. 2021 Jun 1;7:e562. doi: 10.7717/peerj-cs.562. eCollection 2021.
3
EKNN: Ensemble classifier incorporating connectivity and density into kNN with application to cancer diagnosis.EKNN:将连通性和密度纳入k近邻算法的集成分类器及其在癌症诊断中的应用
Artif Intell Med. 2021 Jan;111:101985. doi: 10.1016/j.artmed.2020.101985. Epub 2020 Nov 8.
4
Optimal features selection in the high dimensional data based on robust technique: Application to different health database.基于稳健技术的高维数据最优特征选择:在不同健康数据库中的应用
Heliyon. 2024 Sep 2;10(17):e37241. doi: 10.1016/j.heliyon.2024.e37241. eCollection 2024 Sep 15.
5
R-Ensembler: A greedy rough set based ensemble attribute selection algorithm with kNN imputation for classification of medical data.R-Ensembler:一种基于粗糙集的贪婪集成属性选择算法,具有 kNN 插补功能,用于医学数据的分类。
Comput Methods Programs Biomed. 2020 Feb;184:105122. doi: 10.1016/j.cmpb.2019.105122. Epub 2019 Oct 8.
6
Feature Selection for High-Dimensional and Imbalanced Biomedical Data Based on Robust Correlation Based Redundancy and Binary Grasshopper Optimization Algorithm.基于稳健相关冗余和二进制沙蝇优化算法的高维不平衡生物医学数据特征选择。
Genes (Basel). 2020 Jun 27;11(7):717. doi: 10.3390/genes11070717.
7
R-HEFS: Rough set based heterogeneous ensemble feature selection method for medical data classification.基于粗糙集的异质集成特征选择方法在医学数据分类中的应用。
Artif Intell Med. 2021 Apr;114:102049. doi: 10.1016/j.artmed.2021.102049. Epub 2021 Mar 6.
8
Two-stage feature selection for classification of gene expression data based on an improved Salp Swarm Algorithm.基于改进的鹽蝽群算法的基因表达数据分类的两阶段特征选择
Math Biosci Eng. 2022 Sep 19;19(12):13747-13781. doi: 10.3934/mbe.2022641.
9
Clinical data classification using an enhanced SMOTE and chaotic evolutionary feature selection.使用增强型SMOTE和混沌进化特征选择的临床数据分类
Comput Biol Med. 2020 Nov;126:103991. doi: 10.1016/j.compbiomed.2020.103991. Epub 2020 Sep 18.
10
GSNFS: Gene subnetwork biomarker identification of lung cancer expression data.GSNFS:肺癌表达数据的基因子网生物标志物识别
BMC Med Genomics. 2016 Dec 5;9(Suppl 3):70. doi: 10.1186/s12920-016-0231-4.

引用本文的文献

1
Margin weighted robust discriminant score for feature selection in imbalanced gene expression classification.用于不平衡基因表达分类中特征选择的边缘加权鲁棒判别分数
PLoS One. 2025 Jun 10;20(6):e0325147. doi: 10.1371/journal.pone.0325147. eCollection 2025.

本文引用的文献

1
Gene Expression-Based Cancer Classification for Handling the Class Imbalance Problem and Curse of Dimensionality.基于基因表达的癌症分类,用于处理类别不平衡问题和维度诅咒。
Int J Mol Sci. 2024 Feb 9;25(4):2102. doi: 10.3390/ijms25042102.
2
Optimizing Gene Selection and Cancer Classification with Hybrid Sine Cosine and Cuckoo Search Algorithm.基于混合正弦余弦和布谷鸟搜索算法优化基因选择和癌症分类。
J Med Syst. 2024 Jan 9;48(1):10. doi: 10.1007/s10916-023-02031-1.
3
Feature selection for high dimensional microarray gene expression data via weighted signal to noise ratio.
基于加权信噪比的高维微阵列基因表达数据特征选择。
PLoS One. 2023 Apr 25;18(4):e0284619. doi: 10.1371/journal.pone.0284619. eCollection 2023.
4
mRNA and microRNA selection for breast cancer molecular subtype stratification using meta-heuristic based algorithms.基于元启发式算法的乳腺癌分子亚型分层的 mRNA 和 microRNA 选择。
Genomics. 2020 Sep;112(5):3207-3217. doi: 10.1016/j.ygeno.2020.06.014. Epub 2020 Jun 9.
5
Relief-based feature selection: Introduction and review.基于缓解的特征选择:介绍与综述。
J Biomed Inform. 2018 Sep;85:189-203. doi: 10.1016/j.jbi.2018.07.014. Epub 2018 Jul 18.
6
A feature selection method for classification within functional genomics experiments based on the proportional overlapping score.基于比例重叠得分的功能基因组学实验分类特征选择方法。
BMC Bioinformatics. 2014 Aug 11;15(1):274. doi: 10.1186/1471-2105-15-274.
7
mRMRe: an R package for parallelized mRMR ensemble feature selection.mRMRe:一个用于并行化 mRMR 集成特征选择的 R 包。
Bioinformatics. 2013 Sep 15;29(18):2365-8. doi: 10.1093/bioinformatics/btt383. Epub 2013 Jul 3.
8
Principal component analysis-based filtering improves detection for Affymetrix gene expression arrays.基于主成分分析的过滤可提高 Affymetrix 基因表达阵列的检测能力。
Nucleic Acids Res. 2011 Jul;39(13):e86. doi: 10.1093/nar/gkr241. Epub 2011 Apr 27.
9
Forward selection of explanatory variables.解释变量的向前选择法。
Ecology. 2008 Sep;89(9):2623-32. doi: 10.1890/07-0986.1.
10
Monte Carlo feature selection for supervised classification.用于监督分类的蒙特卡罗特征选择
Bioinformatics. 2008 Jan 1;24(1):110-7. doi: 10.1093/bioinformatics/btm486. Epub 2007 Nov 28.