• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

用于不平衡基因表达分类中特征选择的边缘加权鲁棒判别分数

Margin weighted robust discriminant score for feature selection in imbalanced gene expression classification.

作者信息

Gul Sheema, Muhammad Khan Dost, Aldahmani Saeed, Khan Zardad

机构信息

Department of Statistics, Abdul Wali Khan University, Mardan, Pakistan.

Department of Statistics and Business Analytics, United Arab Emirates University, Al Ain, United Arab Emirates.

出版信息

PLoS One. 2025 Jun 10;20(6):e0325147. doi: 10.1371/journal.pone.0325147. eCollection 2025.

DOI:10.1371/journal.pone.0325147
PMID:40493555
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC12151410/
Abstract

High-dimensional gene expression data poses significant challenges for binary classification, particularly in the context of feature selection methods. Conventional methods, for example, Proportional Overlap Score, Wilcoxon Rank-Sum Test, Weighted Signal to Noise Ratio, ensemble Minimum Redundancy and Maximum Relevance, Fisher Score and Robust Weighted Score for unbalanced data are impacted by key challenges, such as, class imbalance and redundancy. To mitigate these issues, customized feature selection methods are required to tackle the class imbalance issue. This study proposes a more robust solution, Margin Weighted Robust Discriminant Score, for feature selection in the context of high dimensional imbalanced problems. MW-RDS integrates a minority amplification factor to ensure the impact of minority class observation during feature ranking process. The amplification factor along with class specific stability weights obtained from minority-focused robust discriminant score are used for achieving maximum differential capability of genes/features. The score is weighted by margin weights extracted from support vectors to enhance the discriminative power of genes/features thereby highlighting its potential for class separation. Finally, top-ranked genes/features are constrained using [Formula: see text]-regularization to discard redundant genes while identifying the most significant ones. The performance of the proposed method is tested on 9 openly accessible gene expression datasets, using Random Forest, Support Vector Machines, and Weighted k Nearest Neighbors classifiers in term of performance metrics, i.e., accuracy, sensitivity, specificity, F1-score, and precision. The results reveal that the proposed method outperforms the existing methods in most of the cases. Boxplots and stability-plots are also generated to gain a deeper understanding of the results. To futher assess the efficacy of the proposed method, the paper also gives a detailed simulation study.

摘要

高维基因表达数据给二元分类带来了重大挑战,尤其是在特征选择方法的背景下。传统方法,例如比例重叠分数、威尔科克森秩和检验、加权信噪比、集成最小冗余最大相关性、费舍尔分数以及针对不平衡数据的稳健加权分数,都受到诸如类不平衡和冗余等关键挑战的影响。为了缓解这些问题,需要定制化的特征选择方法来解决类不平衡问题。本研究针对高维不平衡问题的特征选择提出了一种更稳健的解决方案——边际加权稳健判别分数。MW - RDS集成了一个少数类放大因子,以确保在特征排序过程中少数类观测值的影响。该放大因子与从关注少数类的稳健判别分数中获得的类特定稳定性权重一起用于实现基因/特征的最大区分能力。该分数由从支持向量中提取的边际权重加权,以增强基因/特征的判别力,从而突出其类分离潜力。最后,使用[公式:见原文]正则化对排名靠前的基因/特征进行约束,以在识别最重要基因的同时丢弃冗余基因。使用随机森林、支持向量机和加权k近邻分类器,根据性能指标,即准确率、灵敏度、特异性、F1分数和精确率,在9个公开可用的基因表达数据集上测试了所提出方法的性能。结果表明,在大多数情况下,所提出的方法优于现有方法。还生成了箱线图和稳定性图,以更深入地理解结果。为了进一步评估所提出方法的有效性,本文还进行了详细的模拟研究。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6c68/12151410/c2353ff5f7d7/pone.0325147.g016.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6c68/12151410/a9e661a68bbf/pone.0325147.g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6c68/12151410/000f53c44e4b/pone.0325147.g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6c68/12151410/a2342331a767/pone.0325147.g003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6c68/12151410/14d4ed732f77/pone.0325147.g004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6c68/12151410/a6c083dd87d1/pone.0325147.g005.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6c68/12151410/fbbaa7460017/pone.0325147.g006.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6c68/12151410/e88d565a0a02/pone.0325147.g007.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6c68/12151410/6b861f0a6bdd/pone.0325147.g008.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6c68/12151410/e78398d95722/pone.0325147.g009.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6c68/12151410/a29dec0c600b/pone.0325147.g010.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6c68/12151410/bd79203e6e82/pone.0325147.g011.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6c68/12151410/df7d236a15a0/pone.0325147.g012.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6c68/12151410/78143d0780bd/pone.0325147.g013.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6c68/12151410/2dd4de56eab0/pone.0325147.g014.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6c68/12151410/300d1a930ee2/pone.0325147.g015.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6c68/12151410/c2353ff5f7d7/pone.0325147.g016.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6c68/12151410/a9e661a68bbf/pone.0325147.g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6c68/12151410/000f53c44e4b/pone.0325147.g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6c68/12151410/a2342331a767/pone.0325147.g003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6c68/12151410/14d4ed732f77/pone.0325147.g004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6c68/12151410/a6c083dd87d1/pone.0325147.g005.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6c68/12151410/fbbaa7460017/pone.0325147.g006.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6c68/12151410/e88d565a0a02/pone.0325147.g007.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6c68/12151410/6b861f0a6bdd/pone.0325147.g008.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6c68/12151410/e78398d95722/pone.0325147.g009.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6c68/12151410/a29dec0c600b/pone.0325147.g010.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6c68/12151410/bd79203e6e82/pone.0325147.g011.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6c68/12151410/df7d236a15a0/pone.0325147.g012.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6c68/12151410/78143d0780bd/pone.0325147.g013.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6c68/12151410/2dd4de56eab0/pone.0325147.g014.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6c68/12151410/300d1a930ee2/pone.0325147.g015.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6c68/12151410/c2353ff5f7d7/pone.0325147.g016.jpg

相似文献

1
Margin weighted robust discriminant score for feature selection in imbalanced gene expression classification.用于不平衡基因表达分类中特征选择的边缘加权鲁棒判别分数
PLoS One. 2025 Jun 10;20(6):e0325147. doi: 10.1371/journal.pone.0325147. eCollection 2025.
2
Feature selection via robust weighted score for high dimensional binary class-imbalanced gene expression data.通过稳健加权分数对高维二元类不平衡基因表达数据进行特征选择
Heliyon. 2024 Sep 30;10(19):e38547. doi: 10.1016/j.heliyon.2024.e38547. eCollection 2024 Oct 15.
3
Double weighted k nearest neighbours for binary classification of high dimensional genomic data.用于高维基因组数据二元分类的双重加权k近邻算法
Sci Rep. 2025 Apr 12;15(1):12681. doi: 10.1038/s41598-025-97505-2.
4
Robust proportional overlapping analysis for feature selection in binary classification within functional genomic experiments.功能基因组实验中二元分类特征选择的稳健比例重叠分析
PeerJ Comput Sci. 2021 Jun 1;7:e562. doi: 10.7717/peerj-cs.562. eCollection 2021.
5
Modified Robust Proportional Overlapping Score for feature selection in high-dimensional micro-array data.用于高维微阵列数据特征选择的改进型稳健比例重叠分数
Comput Biol Med. 2025 Jun;191:110165. doi: 10.1016/j.compbiomed.2025.110165. Epub 2025 Apr 14.
6
Feature weight estimation for gene selection: a local hyperlinear learning approach.特征权重估计在基因选择中的应用:一种局部超线性学习方法。
BMC Bioinformatics. 2014 Mar 14;15:70. doi: 10.1186/1471-2105-15-70.
7
Optimal features selection in the high dimensional data based on robust technique: Application to different health database.基于稳健技术的高维数据最优特征选择:在不同健康数据库中的应用
Heliyon. 2024 Sep 2;10(17):e37241. doi: 10.1016/j.heliyon.2024.e37241. eCollection 2024 Sep 15.
8
Class-imbalanced classifiers for high-dimensional data.高维数据的不平衡分类器。
Brief Bioinform. 2013 Jan;14(1):13-26. doi: 10.1093/bib/bbs006. Epub 2012 Mar 9.
9
A two-stage hybrid biomarker selection method based on ensemble filter and binary differential evolution incorporating binary African vultures optimization.基于集成筛选器和二进制差分进化并结合二进制非洲秃鹫优化的两阶段混合生物标志物选择方法。
BMC Bioinformatics. 2023 Apr 4;24(1):130. doi: 10.1186/s12859-023-05247-7.
10
Feature selection for high dimensional microarray gene expression data via weighted signal to noise ratio.基于加权信噪比的高维微阵列基因表达数据特征选择。
PLoS One. 2023 Apr 25;18(4):e0284619. doi: 10.1371/journal.pone.0284619. eCollection 2023.

本文引用的文献

1
Addressing imbalanced data classification with Cluster-Based Reduced Noise SMOTE.基于聚类的降噪合成少数过采样技术解决不平衡数据分类问题
PLoS One. 2025 Feb 10;20(2):e0317396. doi: 10.1371/journal.pone.0317396. eCollection 2025.
2
Enhancing Robust and Stable Feature Selection Through the Integration of Ranking Methods and Wrapper Techniques in Genetic Data Classification.通过在基因数据分类中整合排序方法和包装技术来增强鲁棒且稳定的特征选择
Methods Mol Biol. 2025;2880:243-254. doi: 10.1007/978-1-0716-4276-4_12.
3
A machine learning based variable selection algorithm for binary classification of perinatal mortality.
一种基于机器学习的围产期死亡率二元分类变量选择算法。
PLoS One. 2025 Jan 16;20(1):e0315498. doi: 10.1371/journal.pone.0315498. eCollection 2025.
4
Feature selection via robust weighted score for high dimensional binary class-imbalanced gene expression data.通过稳健加权分数对高维二元类不平衡基因表达数据进行特征选择
Heliyon. 2024 Sep 30;10(19):e38547. doi: 10.1016/j.heliyon.2024.e38547. eCollection 2024 Oct 15.
5
Deep learning empowered breast cancer diagnosis: Advancements in detection and classification.深度学习助力乳腺癌诊断:检测与分类技术的新进展。
PLoS One. 2024 Jul 11;19(7):e0304757. doi: 10.1371/journal.pone.0304757. eCollection 2024.
6
A novel fusion of genetic grey wolf optimization and kernel extreme learning machines for precise diabetic eye disease classification.一种新颖的遗传灰狼优化与核极限学习机融合方法,用于精确的糖尿病眼病分类。
PLoS One. 2024 May 20;19(5):e0303094. doi: 10.1371/journal.pone.0303094. eCollection 2024.
7
Performance analysis of data resampling on class imbalance and classification techniques on multi-omics data for cancer classification.基于数据重采样的类别不平衡性能分析及分类技术在癌症分类多组学数据中的应用。
PLoS One. 2024 Feb 29;19(2):e0293607. doi: 10.1371/journal.pone.0293607. eCollection 2024.
8
Hybrid feature selection and classification technique for early prediction and severity of diabetes type 2.用于 2 型糖尿病早期预测和严重程度的混合特征选择和分类技术。
PLoS One. 2024 Jan 18;19(1):e0292100. doi: 10.1371/journal.pone.0292100. eCollection 2024.
9
Prediction of lung cancer using gene expression and deep learning with KL divergence gene selection.使用基因表达和深度学习以及 KL 散度基因选择预测肺癌。
BMC Bioinformatics. 2022 May 12;23(1):175. doi: 10.1186/s12859-022-04689-9.
10
Feature Selection for High-Dimensional and Imbalanced Biomedical Data Based on Robust Correlation Based Redundancy and Binary Grasshopper Optimization Algorithm.基于稳健相关冗余和二进制沙蝇优化算法的高维不平衡生物医学数据特征选择。
Genes (Basel). 2020 Jun 27;11(7):717. doi: 10.3390/genes11070717.