结合特征排序和进化方法用于高维DNA微阵列基因表达数据的分类

Incorporating feature ranking and evolutionary methods for the classification of high-dimensional DNA microarray gene expression data.

作者信息

Abedini Mani, Kirley Michael, Chiong Raymond

机构信息

Department of Computing and Information Systems, The University of Melbourne, Victoria 3010, Australia ; IBM Research Australia, Carlton, Victoria 3053, Australia.

出版信息

Australas Med J. 2013 May 30;6(5):272-9. doi: 10.4066/AMJ.2013.1641. Print 2013.

DOI:10.4066/AMJ.2013.1641

PMID:23745148

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC3674418/

Abstract

BACKGROUND

DNA microarray gene expression classification poses a challenging task to the machine learning domain. Typically, the dimensionality of gene expression data sets could go from several thousands to over 10,000 genes. A potential solution to this issue is using feature selection to reduce the dimensionality.

AIMS

The aim of this paper is to investigate how we can use feature quality information to improve the precision of microarray gene expression classification tasks.

METHOD

We propose two evolutionary machine learning models based on the eXtended Classifier System (XCS) and a typical feature selection methodology. The first one, which we call FS-XCS, uses feature selection for feature reduction purposes. The second model is GRD-XCS, which uses feature ranking to bias the rule discovery process of XCS.

RESULTS

The results indicate that the use of feature selection/ranking methods is essential for tackling highdimensional classification tasks, such as microarray gene expression classification. However, the results also suggest that using feature ranking to bias the rule discovery process performs significantly better than using the feature reduction method. In other words, using feature quality information to develop a smarter learning procedure is more efficient than reducing the feature set.

CONCLUSION

Our findings have shown that extracting feature quality information can assist the learning process and improve classification accuracy. On the other hand, relying exclusively on the feature quality information might potentially decrease the classification performance (e.g., using feature reduction). Therefore, we recommend a hybrid approach that uses feature quality information to direct the learning process by highlighting the more informative features, but at the same time not restricting the learning process to explore other features.

摘要

背景

DNA微阵列基因表达分类对机器学习领域而言是一项具有挑战性的任务。通常，基因表达数据集的维度可能从数千个基因到超过10000个基因不等。解决这个问题的一个潜在方法是使用特征选择来降低维度。

目的

本文的目的是研究如何利用特征质量信息来提高微阵列基因表达分类任务的精度。

方法

我们基于扩展分类器系统（XCS）和一种典型的特征选择方法提出了两种进化机器学习模型。第一种我们称为FS-XCS，用于通过特征选择来减少特征数量。第二种模型是GRD-XCS，它使用特征排序来使XCS的规则发现过程产生偏差。

结果

结果表明，使用特征选择/排序方法对于处理高维分类任务（如微阵列基因表达分类）至关重要。然而，结果也表明，使用特征排序来使规则发现过程产生偏差的效果明显优于使用特征约简方法。换句话说，利用特征质量信息来开发更智能的学习过程比减少特征集更有效。

结论

我们的研究结果表明，提取特征质量信息可以辅助学习过程并提高分类准确率。另一方面，单纯依赖特征质量信息可能会降低分类性能（例如，使用特征约简）。因此，我们建议采用一种混合方法，即利用特征质量信息通过突出更具信息性的特征来指导学习过程，但同时不限制学习过程去探索其他特征。

相似文献

Incorporating feature ranking and evolutionary methods for the classification of high-dimensional DNA microarray gene expression data.结合特征排序和进化方法用于高维DNA微阵列基因表达数据的分类

Australas Med J. 2013 May 30;6(5):272-9. doi: 10.4066/AMJ.2013.1641. Print 2013.

Analysis and improvement of fitness exploitation in XCS: bounding models, tournament selection, and bilateral accuracy.XCS中适应度利用的分析与改进：边界模型、锦标赛选择和双边准确性

Evol Comput. 2003 Fall;11(3):239-77. doi: 10.1162/106365603322365298.

A study of structural and parametric learning in XCS.XCS中结构与参数学习的研究

Evol Comput. 2006 Spring;14(1):1-19. doi: 10.1162/evco.2006.14.1.1.

Top scoring pairs for feature selection in machine learning and applications to cancer outcome prediction.机器学习中特征选择的最佳评分对及其在癌症预后预测中的应用。

BMC Bioinformatics. 2011 Sep 23;12:375. doi: 10.1186/1471-2105-12-375.

Feature selection and classification of MAQC-II breast cancer and multiple myeloma microarray gene expression data.MAQC-II 乳腺癌和多发性骨髓瘤基因表达数据的特征选择和分类。

PLoS One. 2009 Dec 11;4(12):e8250. doi: 10.1371/journal.pone.0008250.

Biomarker detection using corrected degree of domesticity in hybrid social network feature selection for improving classifier performance.使用混合社交网络特征选择中校正的家养度进行生物标志物检测，以提高分类器性能。

BMC Bioinformatics. 2023 Oct 30;24(1):407. doi: 10.1186/s12859-023-05540-5.

Tumor classification ranking from microarray data.基于微阵列数据的肿瘤分类排名

BMC Genomics. 2008 Sep 16;9 Suppl 2(Suppl 2):S21. doi: 10.1186/1471-2164-9-S2-S21.

Technology of Informative Feature Selection for Immunosignature Analysis.免疫特征分析信息特征选择技术。

Sovrem Tekhnologii Med. 2021;12(5):19-25. doi: 10.17691/stm2020.12.5.02. Epub 2020 Oct 28.

Feature Selection Applied to Microarray Data.应用于微阵列数据的特征选择

Methods Mol Biol. 2019;1986:123-152. doi: 10.1007/978-1-4939-9442-7_6.

Rule fitness and pathology in learning classifier systems.学习分类器系统中的规则适应性与病理学

Evol Comput. 2004 Spring;12(1):99-135. doi: 10.1162/evco.2004.12.1.99.

引用本文的文献

Artificial intelligence in health - the three big challenges.健康领域的人工智能——三大挑战。

Australas Med J. 2013 May 30;6(5):315-7. doi: 10.4066/AMJ.2013.1758. Print 2013.

本文引用的文献

Automated global structure extraction for effective local building block processing in XCS.用于XCS中有效局部构建块处理的自动全局结构提取。

Evol Comput. 2006 Fall;14(3):345-80. doi: 10.1162/evco.2006.14.3.345.

Evol Comput. 2003 Fall;11(3):239-77. doi: 10.1162/106365603322365298.

Gene expression correlates of clinical prostate cancer behavior.临床前列腺癌行为的基因表达相关性

Cancer Cell. 2002 Mar;1(2):203-9. doi: 10.1016/s1535-6108(02)00030-2.

Classification and diagnostic prediction of cancers using gene expression profiling and artificial neural networks.利用基因表达谱和人工神经网络进行癌症的分类与诊断预测。

Nat Med. 2001 Jun;7(6):673-9. doi: 10.1038/89044.

Gene-expression profiles in hereditary breast cancer.遗传性乳腺癌中的基因表达谱

N Engl J Med. 2001 Feb 22;344(8):539-48. doi: 10.1056/NEJM200102223440801.

Knowledge-based analysis of microarray gene expression data by using support vector machines.利用支持向量机对微阵列基因表达数据进行基于知识的分析。

Proc Natl Acad Sci U S A. 2000 Jan 4;97(1):262-7. doi: 10.1073/pnas.97.1.262.

Molecular classification of cancer: class discovery and class prediction by gene expression monitoring.癌症的分子分类：通过基因表达监测进行类别发现和类别预测。

Science. 1999 Oct 15;286(5439):531-7. doi: 10.1126/science.286.5439.531.

Broad patterns of gene expression revealed by clustering analysis of tumor and normal colon tissues probed by oligonucleotide arrays.通过寡核苷酸阵列探测的肿瘤和正常结肠组织的聚类分析所揭示的基因表达广泛模式。

Proc Natl Acad Sci U S A. 1999 Jun 8;96(12):6745-50. doi: 10.1073/pnas.96.12.6745.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验