基于样本加权的基因芯片数据中稳定基因的选择。

Stable gene selection from microarray data via sample weighting.

机构信息

Binghamton University, Binghamton.

出版信息

IEEE/ACM Trans Comput Biol Bioinform. 2012 Jan-Feb;9(1):262-72. doi: 10.1109/TCBB.2011.47. Epub 2011 Mar 3.

DOI:10.1109/TCBB.2011.47

Abstract

Feature selection from gene expression microarray data is a widely used technique for selecting candidate genes in various cancer studies. Besides predictive ability of the selected genes, an important aspect in evaluating a selection method is the stability of the selected genes. Experts instinctively have high confidence in the result of a selection method that selects similar sets of genes under some variations to the samples. However, a common problem of existing feature selection methods for gene expression data is that the selected genes by the same method often vary significantly with sample variations. In this work, we propose a general framework of sample weighting to improve the stability of feature selection methods under sample variations. The framework first weights each sample in a given training set according to its influence to the estimation of feature relevance, and then provides the weighted training set to a feature selection method. We also develop an efficient margin-based sample weighting algorithm under this framework. Experiments on a set of microarray data sets show that the proposed algorithm significantly improves the stability of representative feature selection algorithms such as SVM-RFE and ReliefF, without sacrificing their classification performance. Moreover, the proposed algorithm also leads to more stable gene signatures than the state-of-the-art ensemble method, particularly for small signature sizes.

摘要

从基因表达微阵列数据中进行特征选择是一种广泛应用的技术，用于在各种癌症研究中选择候选基因。除了所选基因的预测能力外，评估选择方法的一个重要方面是所选基因的稳定性。专家本能地对选择方法的结果充满信心，该方法在对样本进行某些变化时选择相似的基因集。然而，基因表达数据特征选择方法的一个常见问题是，相同方法选择的基因通常随样本变化而显著变化。在这项工作中，我们提出了一种通用的样本加权框架，以提高特征选择方法在样本变化下的稳定性。该框架首先根据特征相关性估计对每个样本的影响对每个样本进行加权，然后将加权训练集提供给特征选择方法。我们还在该框架下开发了一种有效的基于边缘的样本加权算法。在一组微阵列数据集上的实验表明，所提出的算法显著提高了 SVM-RFE 和 ReliefF 等代表性特征选择算法的稳定性，而不会牺牲其分类性能。此外，与最先进的集成方法相比，所提出的算法还产生了更稳定的基因特征，特别是对于较小的特征大小。

相似文献

Stable gene selection from microarray data via sample weighting.基于样本加权的基因芯片数据中稳定基因的选择。

IEEE/ACM Trans Comput Biol Bioinform. 2012 Jan-Feb;9(1):262-72. doi: 10.1109/TCBB.2011.47. Epub 2011 Mar 3.

Ensemble Feature Learning of Genomic Data Using Support Vector Machine.使用支持向量机的基因组数据集成特征学习

PLoS One. 2016 Jun 15;11(6):e0157330. doi: 10.1371/journal.pone.0157330. eCollection 2016.

Cancer classification from gene expression data by NPPC ensemble.基于 NPPC 集成的基因表达数据的癌症分类。

IEEE/ACM Trans Comput Biol Bioinform. 2011 May-Jun;8(3):659-71. doi: 10.1109/TCBB.2010.36.

Recursive gene selection based on maximum margin criterion: a comparison with SVM-RFE.基于最大间隔准则的递归基因选择：与支持向量机递归特征消除法的比较

BMC Bioinformatics. 2006 Dec 25;7:543. doi: 10.1186/1471-2105-7-543.

Robust feature selection for microarray data based on multicriterion fusion.基于多准则融合的微阵列数据稳健特征选择。

IEEE/ACM Trans Comput Biol Bioinform. 2011 Jul-Aug;8(4):1080-92. doi: 10.1109/TCBB.2010.103.

The feature selection bias problem in relation to high-dimensional gene data.与高维基因数据相关的特征选择偏差问题。

Artif Intell Med. 2016 Jan;66:63-71. doi: 10.1016/j.artmed.2015.11.001. Epub 2015 Nov 14.

Robust biomarker identification for cancer diagnosis with ensemble feature selection methods.基于集成特征选择方法的癌症诊断稳健生物标志物识别。

Bioinformatics. 2010 Feb 1;26(3):392-8. doi: 10.1093/bioinformatics/btp630. Epub 2009 Nov 25.

Hybrid Framework Using Multiple-Filters and an Embedded Approach for an Efficient Selection and Classification of Microarray Data.使用多滤波器和嵌入式方法的混合框架用于微阵列数据的高效选择和分类

IEEE/ACM Trans Comput Biol Bioinform. 2016 Jan-Feb;13(1):12-26. doi: 10.1109/TCBB.2015.2474384. Epub 2015 Aug 28.

Double Selection Based Semi-Supervised Clustering Ensemble for Tumor Clustering from Gene Expression Profiles.基于双重选择的半监督聚类集成用于从基因表达谱中进行肿瘤聚类

IEEE/ACM Trans Comput Biol Bioinform. 2014 Jul-Aug;11(4):727-40. doi: 10.1109/TCBB.2014.2315996.

Feature weight estimation for gene selection: a local hyperlinear learning approach.特征权重估计在基因选择中的应用：一种局部超线性学习方法。

BMC Bioinformatics. 2014 Mar 14;15:70. doi: 10.1186/1471-2105-15-70.

引用本文的文献

A New Differential Gene Expression Based Simulated Annealing for Solving Gene Selection Problem: A Case Study on Eosinophilic Esophagitis and Few Other Gastro-intestinal Diseases.一种基于差异基因表达的新型模拟退火算法用于解决基因选择问题：嗜酸性食管炎及其他几种胃肠道疾病的案例研究

Biochem Genet. 2024 Dec 6. doi: 10.1007/s10528-024-10987-z.

An Immune-Gene-Based Classifier Predicts Prognosis in Patients With Cervical Squamous Cell Carcinoma.基于免疫基因的分类器可预测宫颈鳞状细胞癌患者的预后。

Front Mol Biosci. 2021 Jul 5;8:679474. doi: 10.3389/fmolb.2021.679474. eCollection 2021.

A Hybrid Ensemble Approach for Identifying Robust Differentially Methylated Loci in Pan-Cancers.一种用于识别泛癌中稳健差异甲基化位点的混合集成方法。

Front Genet. 2019 Sep 5;10:774. doi: 10.3389/fgene.2019.00774. eCollection 2019.

Classifying Incomplete Gene-Expression Data: Ensemble Learning with Non-Pre-Imputation Feature Filtering and Best-First Search Technique.基于非预填补特征过滤和最佳优先搜索技术的集成学习在不完全基因表达数据分类中的应用

Int J Mol Sci. 2018 Oct 30;19(11):3398. doi: 10.3390/ijms19113398.

An Integrated Approach for Identifying Molecular Subtypes in Human Colon Cancer Using Gene Expression Data.一种利用基因表达数据识别人类结肠癌分子亚型的综合方法。

Genes (Basel). 2018 Aug 2;9(8):397. doi: 10.3390/genes9080397.

An Occlusion-Robust Feature Selection Framework in Pedestrian Detection .行人检测中的一种鲁棒性遮挡特征选择框架。

Sensors (Basel). 2018 Jul 13;18(7):2272. doi: 10.3390/s18072272.

An experimental study of the intrinsic stability of random forest variable importance measures.随机森林变量重要性度量内在稳定性的实验研究

BMC Bioinformatics. 2016 Feb 3;17:60. doi: 10.1186/s12859-016-0900-5.

iRDA: a new filter towards predictive, stable, and enriched candidate genes.iRDA：一种筛选预测性、稳定性和富集性候选基因的新方法。

BMC Genomics. 2015 Dec 9;16:1041. doi: 10.1186/s12864-015-2129-5.

Algebraic comparison of partial lists in bioinformatics.生物信息学中部分列表的代数比较。

PLoS One. 2012;7(5):e36540. doi: 10.1371/journal.pone.0036540. Epub 2012 May 17.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

基于样本加权的基因芯片数据中稳定基因的选择。

Stable gene selection from microarray data via sample weighting.

机构信息

出版信息

相似文献

引用本文的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

相似文献

引用本文的文献