一种基于类别不平衡感知的 Relief 算法，用于使用微阵列基因表达数据进行肿瘤分类。

College of Information Science and Engineering Hunan University Changsha, China.

Comput Biol Chem. 2019 Jun;80:121-127. doi: 10.1016/j.compbiolchem.2019.03.017. Epub 2019 Mar 24.

DNA microarray data has been widely used in cancer research due to the significant advantage helped to successfully distinguish between tumor classes. However, typical gene expression data usually presents a high-dimensional imbalanced characteristic, which poses severe challenge for traditional machine learning methods to construct a robust classifier performing well on both the minority and majority classes. As one of the most successful feature weighting techniques, Relief is considered to particularly suit to handle high-dimensional problems. Unfortunately, almost all relief-based methods have not taken the class imbalance distribution into account. This study identifies that existing Relief-based algorithms may underestimate the features with the discernibility ability of minority classes, and ignore the distribution characteristic of minority class samples. As a result, an additional bias towards being classified into the majority classes can be introduced. To this end, a new method, named imRelief, is proposed for efficiently handling high-dimensional imbalanced gene expression data. imRelief can correct the bias towards to the majority classes, and consider the scattered distributional characteristic of minority class samples in the process of estimating feature weights. This way, imRelief has the ability to reward the features which perform well at separating the minority classes from other classes. Experiments on four microarray gene expression data sets demonstrate the effectiveness of imRelief in both feature weighting and feature subset selection applications.

DNA 微阵列数据由于在成功区分肿瘤类别方面具有显著优势，因此在癌症研究中得到了广泛应用。然而，典型的基因表达数据通常呈现出高维不平衡的特征，这对传统的机器学习方法构建在少数类和多数类上都能很好地执行的稳健分类器构成了严峻挑战。作为最成功的特征加权技术之一，Relief 被认为特别适合处理高维问题。不幸的是，几乎所有基于 Relief 的方法都没有考虑到类不平衡分布。本研究发现，现有的基于 Relief 的算法可能低估了具有少数类可辨别能力的特征，并且忽略了少数类样本的分布特征。结果，可能会引入对分类为多数类的额外偏差。为此，提出了一种名为 imRelief 的新方法，用于有效地处理高维不平衡基因表达数据。imRelief 可以纠正偏向多数类的偏差，并在估计特征权重的过程中考虑少数类样本的分散分布特征。这样，imRelief 就有能力奖励那些在区分少数类和其他类方面表现良好的特征。在四个微阵列基因表达数据集上的实验表明，imRelief 在特征加权和特征子集选择应用中都具有有效性。

相似文献

A class imbalance-aware Relief algorithm for the classification of tumors using microarray gene expression data.

Comput Biol Chem. 2019 Jun;80:121-127. doi: 10.1016/j.compbiolchem.2019.03.017. Epub 2019 Mar 24.

C-HMOSHSSA: Gene selection for cancer classification using multi-objective meta-heuristic and machine learning methods.

Comput Methods Programs Biomed. 2019 Sep;178:219-235. doi: 10.1016/j.cmpb.2019.06.029. Epub 2019 Jun 29.

The feature selection bias problem in relation to high-dimensional gene data.

Artif Intell Med. 2016 Jan;66:63-71. doi: 10.1016/j.artmed.2015.11.001. Epub 2015 Nov 14.

Feature weight estimation for gene selection: a local hyperlinear learning approach.

BMC Bioinformatics. 2014 Mar 14;15:70. doi: 10.1186/1471-2105-15-70.

Class prediction for high-dimensional class-imbalanced data.

BMC Bioinformatics. 2010 Oct 20;11:523. doi: 10.1186/1471-2105-11-523.

Chaotic genetic algorithm for gene selection and classification problems.

OMICS. 2009 Oct;13(5):407-20. doi: 10.1089/omi.2009.0007.

Stable gene selection from microarray data via sample weighting.

IEEE/ACM Trans Comput Biol Bioinform. 2012 Jan-Feb;9(1):262-72. doi: 10.1109/TCBB.2011.47. Epub 2011 Mar 3.

A centroid-based gene selection method for microarray data classification.

J Theor Biol. 2016 Jul 7;400:32-41. doi: 10.1016/j.jtbi.2016.03.034. Epub 2016 Apr 4.

Tumor classification ranking from microarray data.

BMC Genomics. 2008 Sep 16;9 Suppl 2(Suppl 2):S21. doi: 10.1186/1471-2164-9-S2-S21.

Class-imbalanced classifiers for high-dimensional data.

Brief Bioinform. 2013 Jan;14(1):13-26. doi: 10.1093/bib/bbs006. Epub 2012 Mar 9.

引用本文的文献

Navigating the microarray landscape: a comprehensive review of feature selection techniques and their applications.

Front Big Data. 2025 Jul 10;8:1624507. doi: 10.3389/fdata.2025.1624507. eCollection 2025.

Suppr 超能文献

核心技术专利：CN118964589B侵权必究

相似文献

A class imbalance-aware Relief algorithm for the classification of tumors using microarray gene expression data.

Comput Biol Chem. 2019 Jun;80:121-127. doi: 10.1016/j.compbiolchem.2019.03.017. Epub 2019 Mar 24.

C-HMOSHSSA: Gene selection for cancer classification using multi-objective meta-heuristic and machine learning methods.

Comput Methods Programs Biomed. 2019 Sep;178:219-235. doi: 10.1016/j.cmpb.2019.06.029. Epub 2019 Jun 29.

The feature selection bias problem in relation to high-dimensional gene data.

Artif Intell Med. 2016 Jan;66:63-71. doi: 10.1016/j.artmed.2015.11.001. Epub 2015 Nov 14.

Feature weight estimation for gene selection: a local hyperlinear learning approach.

BMC Bioinformatics. 2014 Mar 14;15:70. doi: 10.1186/1471-2105-15-70.

Class prediction for high-dimensional class-imbalanced data.

BMC Bioinformatics. 2010 Oct 20;11:523. doi: 10.1186/1471-2105-11-523.

Chaotic genetic algorithm for gene selection and classification problems.

OMICS. 2009 Oct;13(5):407-20. doi: 10.1089/omi.2009.0007.

Stable gene selection from microarray data via sample weighting.

IEEE/ACM Trans Comput Biol Bioinform. 2012 Jan-Feb;9(1):262-72. doi: 10.1109/TCBB.2011.47. Epub 2011 Mar 3.

A centroid-based gene selection method for microarray data classification.

J Theor Biol. 2016 Jul 7;400:32-41. doi: 10.1016/j.jtbi.2016.03.034. Epub 2016 Apr 4.

Tumor classification ranking from microarray data.

BMC Genomics. 2008 Sep 16;9 Suppl 2(Suppl 2):S21. doi: 10.1186/1471-2164-9-S2-S21.

Class-imbalanced classifiers for high-dimensional data.

Brief Bioinform. 2013 Jan;14(1):13-26. doi: 10.1093/bib/bbs006. Epub 2012 Mar 9.

引用本文的文献

Navigating the microarray landscape: a comprehensive review of feature selection techniques and their applications.

Front Big Data. 2025 Jul 10;8:1624507. doi: 10.3389/fdata.2025.1624507. eCollection 2025.

A class imbalance-aware Relief algorithm for the classification of tumors using microarray gene expression data.

机构信息

出版信息

相似文献

引用本文的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

相似文献

引用本文的文献