• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

AVC:通过最大化变量互补性,基于曲线下面积选择判别特征。

AVC: Selecting discriminative features on basis of AUC by maximizing variable complementarity.

作者信息

Sun Lei, Wang Jun, Wei Jinmao

机构信息

Institute of Big Data, College of Computer and Control Engineering, Nankai University, 38 Tongyan Road, Tianjin, 300350, China.

出版信息

BMC Bioinformatics. 2017 Mar 14;18(Suppl 3):50. doi: 10.1186/s12859-017-1468-4.

DOI:10.1186/s12859-017-1468-4
PMID:28361689
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC5374660/
Abstract

BACKGROUND

The Receiver Operator Characteristic (ROC) curve is well-known in evaluating classification performance in biomedical field. Owing to its superiority in dealing with imbalanced and cost-sensitive data, the ROC curve has been exploited as a popular metric to evaluate and find out disease-related genes (features). The existing ROC-based feature selection approaches are simple and effective in evaluating individual features. However, these approaches may fail to find real target feature subset due to their lack of effective means to reduce the redundancy between features, which is essential in machine learning.

RESULTS

In this paper, we propose to assess feature complementarity by a trick of measuring the distances between the misclassified instances and their nearest misses on the dimensions of pairwise features. If a misclassified instance and its nearest miss on one feature dimension are far apart on another feature dimension, the two features are regarded as complementary to each other. Subsequently, we propose a novel filter feature selection approach on the basis of the ROC analysis. The new approach employs an efficient heuristic search strategy to select optimal features with highest complementarities. The experimental results on a broad range of microarray data sets validate that the classifiers built on the feature subset selected by our approach can get the minimal balanced error rate with a small amount of significant features.

CONCLUSIONS

Compared with other ROC-based feature selection approaches, our new approach can select fewer features and effectively improve the classification performance.

摘要

背景

在生物医学领域评估分类性能时,接收者操作特征(ROC)曲线广为人知。由于其在处理不平衡和成本敏感数据方面的优势,ROC曲线已被用作评估和找出疾病相关基因(特征)的常用指标。现有的基于ROC的特征选择方法在评估单个特征时简单有效。然而,由于缺乏减少特征间冗余的有效手段,这些方法可能无法找到真正的目标特征子集,而特征间冗余在机器学习中至关重要。

结果

在本文中,我们提出通过一种技巧来评估特征互补性,即测量成对特征维度上误分类实例与其最近的未命中实例之间的距离。如果一个误分类实例与其在一个特征维度上最近的未命中实例在另一个特征维度上相距很远,则认为这两个特征相互互补。随后,我们基于ROC分析提出了一种新颖的过滤特征选择方法。新方法采用高效的启发式搜索策略来选择具有最高互补性的最优特征。在广泛的微阵列数据集上的实验结果验证了基于我们方法选择的特征子集构建的分类器能够以少量显著特征获得最小的平衡错误率。

结论

与其他基于ROC的特征选择方法相比,我们的新方法能够选择更少的特征并有效提高分类性能。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1569/5374660/53fed916c67f/12859_2017_1468_Fig14_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1569/5374660/0e9f685c645c/12859_2017_1468_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1569/5374660/ca2e086c03d2/12859_2017_1468_Fig2_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1569/5374660/5a3ec12b0d4b/12859_2017_1468_Fig3_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1569/5374660/0bc53583bee6/12859_2017_1468_Fig4_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1569/5374660/633b3b56d22c/12859_2017_1468_Fig5_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1569/5374660/dc937280d2f8/12859_2017_1468_Fig6_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1569/5374660/3d9eda37cd98/12859_2017_1468_Fig7_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1569/5374660/d2a11e2dd8b6/12859_2017_1468_Fig8_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1569/5374660/c24db14e6cf0/12859_2017_1468_Fig9_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1569/5374660/e4881a2a54a4/12859_2017_1468_Fig10_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1569/5374660/60300f914965/12859_2017_1468_Fig11_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1569/5374660/7619ab9ce981/12859_2017_1468_Fig12_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1569/5374660/4ef75546fce1/12859_2017_1468_Fig13_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1569/5374660/53fed916c67f/12859_2017_1468_Fig14_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1569/5374660/0e9f685c645c/12859_2017_1468_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1569/5374660/ca2e086c03d2/12859_2017_1468_Fig2_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1569/5374660/5a3ec12b0d4b/12859_2017_1468_Fig3_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1569/5374660/0bc53583bee6/12859_2017_1468_Fig4_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1569/5374660/633b3b56d22c/12859_2017_1468_Fig5_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1569/5374660/dc937280d2f8/12859_2017_1468_Fig6_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1569/5374660/3d9eda37cd98/12859_2017_1468_Fig7_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1569/5374660/d2a11e2dd8b6/12859_2017_1468_Fig8_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1569/5374660/c24db14e6cf0/12859_2017_1468_Fig9_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1569/5374660/e4881a2a54a4/12859_2017_1468_Fig10_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1569/5374660/60300f914965/12859_2017_1468_Fig11_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1569/5374660/7619ab9ce981/12859_2017_1468_Fig12_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1569/5374660/4ef75546fce1/12859_2017_1468_Fig13_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1569/5374660/53fed916c67f/12859_2017_1468_Fig14_HTML.jpg

相似文献

1
AVC: Selecting discriminative features on basis of AUC by maximizing variable complementarity.AVC:通过最大化变量互补性,基于曲线下面积选择判别特征。
BMC Bioinformatics. 2017 Mar 14;18(Suppl 3):50. doi: 10.1186/s12859-017-1468-4.
2
A novel feature selection approach for biomedical data classification.一种用于生物医学数据分类的新特征选择方法。
J Biomed Inform. 2010 Feb;43(1):15-23. doi: 10.1016/j.jbi.2009.07.008. Epub 2009 Jul 30.
3
Multi-variable AUC for sifting complementary features and its biomedical application.多变量 AUC 用于筛选互补特征及其在生物医学中的应用。
Brief Bioinform. 2022 Mar 10;23(2). doi: 10.1093/bib/bbac029.
4
Local-Nearest-Neighbors-Based Feature Weighting for Gene Selection.基于局部最近邻的特征加权基因选择。
IEEE/ACM Trans Comput Biol Bioinform. 2018 Sep-Oct;15(5):1538-1548. doi: 10.1109/TCBB.2017.2712775. Epub 2017 Jun 7.
5
Improving feature selection performance using pairwise pre-evaluation.使用成对预评估提高特征选择性能。
BMC Bioinformatics. 2016 Aug 20;17:312. doi: 10.1186/s12859-016-1178-3.
6
New feature selection for gene expression classification based on degree of class overlap in principal dimensions.基于主成分中类重叠程度的基因表达分类的新特征选择。
Comput Biol Med. 2015 Sep;64:292-8. doi: 10.1016/j.compbiomed.2015.01.022. Epub 2015 Feb 7.
7
An efficient statistical feature selection approach for classification of gene expression data.一种用于基因表达数据分类的高效统计特征选择方法。
J Biomed Inform. 2011 Aug;44(4):529-35. doi: 10.1016/j.jbi.2011.01.001. Epub 2011 Jan 15.
8
Selecting subsets of newly extracted features from PCA and PLS in microarray data analysis.在微阵列数据分析中从主成分分析(PCA)和偏最小二乘法(PLS)中选择新提取特征的子集。
BMC Genomics. 2008 Sep 16;9 Suppl 2(Suppl 2):S24. doi: 10.1186/1471-2164-9-S2-S24.
9
Rough sets and Laplacian score based cost-sensitive feature selection.基于粗糙集和拉普拉斯得分的代价敏感特征选择。
PLoS One. 2018 Jun 18;13(6):e0197564. doi: 10.1371/journal.pone.0197564. eCollection 2018.
10
Feature selection and nearest centroid classification for protein mass spectrometry.蛋白质质谱的特征选择与最近质心分类
BMC Bioinformatics. 2005 Mar 23;6:68. doi: 10.1186/1471-2105-6-68.

引用本文的文献

1
Evaluating Feature Selection Methods for Accurate Diagnosis of Diabetic Kidney Disease.评估用于糖尿病肾病准确诊断的特征选择方法
Biomedicines. 2024 Dec 16;12(12):2858. doi: 10.3390/biomedicines12122858.
2
Multidimensional-Based Prediction of Pressure Ulcers Development and Severity in Hospitalized Frail : A Retrospective Study.多维预测住院虚弱患者压疮的发展和严重程度:一项回顾性研究。
Clin Interv Aging. 2024 Sep 4;19:1509-1517. doi: 10.2147/CIA.S440943. eCollection 2024.
3
Application of Transcriptome-Based Gene Set Featurization for Machine Learning Model to Predict the Origin of Metastatic Cancer.

本文引用的文献

1
Fuzzy preference based feature selection and semisupervised SVM for cancer classification.基于模糊偏好的特征选择与半监督支持向量机用于癌症分类
IEEE Trans Nanobioscience. 2014 Jun;13(2):152-60. doi: 10.1109/TNB.2014.2312132.
2
A top-r feature selection algorithm for microarray gene expression data.一种用于微阵列基因表达数据的顶级特征选择算法。
IEEE/ACM Trans Comput Biol Bioinform. 2012 May-Jun;9(3):754-64. doi: 10.1109/TCBB.2011.151.
3
An efficient statistical feature selection approach for classification of gene expression data.
基于转录组的基因集特征化在机器学习模型预测转移性癌症起源中的应用。
Curr Issues Mol Biol. 2024 Jul 9;46(7):7291-7302. doi: 10.3390/cimb46070432.
4
A two-stage hybrid biomarker selection method based on ensemble filter and binary differential evolution incorporating binary African vultures optimization.基于集成筛选器和二进制差分进化并结合二进制非洲秃鹫优化的两阶段混合生物标志物选择方法。
BMC Bioinformatics. 2023 Apr 4;24(1):130. doi: 10.1186/s12859-023-05247-7.
5
Machine learning classifiers predict key genomic and evolutionary traits across the kingdoms of life.机器学习分类器可预测生命王国的关键基因组和进化特征。
Sci Rep. 2023 Feb 6;13(1):2088. doi: 10.1038/s41598-023-28965-7.
6
Variable selection in Logistic regression model with genetic algorithm.基于遗传算法的逻辑回归模型中的变量选择
Ann Transl Med. 2018 Feb;6(3):45. doi: 10.21037/atm.2018.01.15.
一种用于基因表达数据分类的高效统计特征选择方法。
J Biomed Inform. 2011 Aug;44(4):529-35. doi: 10.1016/j.jbi.2011.01.001. Epub 2011 Jan 15.
4
Local-learning-based feature selection for high-dimensional data analysis.基于局部学习的高维数据分析特征选择。
IEEE Trans Pattern Anal Mach Intell. 2010 Sep;32(9):1610-26. doi: 10.1109/TPAMI.2009.190.
5
Feature selection for gene expression using model-based entropy.基于模型的熵的基因表达特征选择。
IEEE/ACM Trans Comput Biol Bioinform. 2010 Jan-Mar;7(1):25-36. doi: 10.1109/TCBB.2008.35.
6
Small-sample precision of ROC-related estimates.ROC 相关估计的小样本精度。
Bioinformatics. 2010 Mar 15;26(6):822-30. doi: 10.1093/bioinformatics/btq037. Epub 2010 Feb 3.
7
Regularized ROC method for disease classification and biomarker selection with microarray data.用于基于微阵列数据的疾病分类和生物标志物选择的正则化ROC方法。
Bioinformatics. 2005 Dec 15;21(24):4356-62. doi: 10.1093/bioinformatics/bti724. Epub 2005 Oct 18.
8
Simple decision rules for classifying human cancers from gene expression profiles.基于基因表达谱对人类癌症进行分类的简单决策规则。
Bioinformatics. 2005 Oct 15;21(20):3896-904. doi: 10.1093/bioinformatics/bti631. Epub 2005 Aug 16.
9
Gene expression profiling of gliomas strongly predicts survival.胶质瘤的基因表达谱分析能有力地预测生存期。
Cancer Res. 2004 Sep 15;64(18):6503-10. doi: 10.1158/0008-5472.CAN-04-0452.
10
Selecting differentially expressed genes from microarray experiments.从微阵列实验中选择差异表达基因。
Biometrics. 2003 Mar;59(1):133-42. doi: 10.1111/1541-0420.00016.