从微阵列数据中选择最少数量的相关基因以设计精确的组织分类器。

Selecting a minimal number of relevant genes from microarray data to design accurate tissue classifiers.

作者信息

Huang Hui-Ling, Lee Chong-Cheng, Ho Shinn-Ying

机构信息

Department of Information Management, Jin-Wen Institute of Technology, Hsin-Tien 231, Taiwan.

出版信息

Biosystems. 2007 Jul-Aug;90(1):78-86. doi: 10.1016/j.biosystems.2006.07.002. Epub 2006 Jul 10.

Abstract

It is essential to select a minimal number of relevant genes from microarray data while maximizing classification accuracy for the development of inexpensive diagnostic tests. However, it is intractable to simultaneously optimize gene selection and classification accuracy that is a large parameter optimization problem. We propose an efficient evolutionary approach to gene selection from microarray data which can be combined with the optimal design of various multiclass classifiers. The proposed method (named GeneSelect) consists of three parts which are fully cooperated: an efficient encoding scheme of candidate solutions, a generalized fitness function, and an intelligent genetic algorithm (IGA). An existing hybrid approach based on genetic algorithm and maximum likelihood classification (GA/MLHD) is proposed to select a small number of relevant genes for accurate classification of samples. To evaluate the performance of GeneSelect, the gene selection is combined with the same maximum likelihood classification (named IGA/MLHD) for convenient comparisons. The performance of IGA/MLHD is applied to 11 cancer-related human gene expression datasets. The simulation results show that IGA/MLHD is superior to GA/MLHD in terms of the number of selected genes, classification accuracy, and robustness of selected genes and accuracy.

摘要

为了开发低成本的诊断测试,从微阵列数据中选择最少数量的相关基因并同时最大化分类准确率至关重要。然而,同时优化基因选择和分类准确率是一个大型参数优化问题,难以解决。我们提出了一种从微阵列数据中进行基因选择的高效进化方法,该方法可与各种多类分类器的优化设计相结合。所提出的方法(名为GeneSelect)由三个完全协作的部分组成:候选解的高效编码方案、广义适应度函数和智能遗传算法(IGA)。提出了一种基于遗传算法和最大似然分类的现有混合方法(GA/MLHD),以选择少量相关基因用于样本的准确分类。为了评估GeneSelect的性能,将基因选择与相同的最大似然分类相结合(名为IGA/MLHD)以便于比较。将IGA/MLHD的性能应用于11个癌症相关的人类基因表达数据集。模拟结果表明,IGA/MLHD在所选基因数量、分类准确率、所选基因的稳健性和准确率方面优于GA/MLHD。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索