DNA微阵列领域中过滤法与包装法基因选择方法

Filter versus wrapper gene selection approaches in DNA microarray domains.

作者信息

Inza Iñaki, Larrañaga Pedro, Blanco Rosa, Cerrolaza Antonio J

机构信息

Department of Computer Science and Artificial Intelligence, University of the Basque Country, P.O. Box 649, E-20080 Donostia-San Sebastián, Basque Country, Spain.

出版信息

Artif Intell Med. 2004 Jun;31(2):91-103. doi: 10.1016/j.artmed.2004.01.007.

DOI:10.1016/j.artmed.2004.01.007

PMID:15219288

Abstract

DNA microarray experiments generating thousands of gene expression measurements, are used to collect information from tissue and cell samples regarding gene expression differences that could be useful for diagnosis disease, distinction of the specific tumor type, etc. One important application of gene expression microarray data is the classification of samples into known categories. As DNA microarray technology measures the gene expression en masse, this has resulted in data with the number of features (genes) far exceeding the number of samples. As the predictive accuracy of supervised classifiers that try to discriminate between the classes of the problem decays with the existence of irrelevant and redundant features, the necessity of a dimensionality reduction process is essential. We propose the application of a gene selection process, which also enables the biology researcher to focus on promising gene candidates that actively contribute to classification in these large scale microarrays. Two basic approaches for feature selection appear in machine learning and pattern recognition literature: the filter and wrapper techniques. Filter procedures are used in most of the works in the area of DNA microarrays. In this work, a comparison between a group of different filter metrics and a wrapper sequential search procedure is carried out. The comparison is performed in two well-known DNA microarray datasets by the use of four classic supervised classifiers. The study is carried out over the original-continuous and three-intervals discretized gene expression data. While two well-known filter metrics are proposed for continuous data, four classic filter measures are used over discretized data. The same wrapper approach is used for both continuous and discretized data. The application of filter and wrapper gene selection procedures leads to considerably better accuracy results in comparison to the non-gene selection approach, coupled with interesting and notable dimensionality reductions. Although the wrapper approach mainly shows a more accurate behavior than filter metrics, this improvement is coupled with considerable computer-load necessities. We note that most of the genes selected by proposed filter and wrapper procedures in discrete and continuous microarray data appear in the lists of relevant-informative genes detected by previous studies over these datasets. The aim of this work is to make contributions in the field of the gene selection task in DNA microarray datasets. By an extensive comparison with more popular filter techniques, we would like to make contributions in the expansion and study of the wrapper approach in this type of domains.

摘要

DNA微阵列实验可生成数千个基因表达测量值，用于从组织和细胞样本中收集有关基因表达差异的信息，这些差异可能有助于疾病诊断、特定肿瘤类型的区分等。基因表达微阵列数据的一个重要应用是将样本分类到已知类别中。由于DNA微阵列技术可大规模测量基因表达，这导致数据的特征数量（基因）远远超过样本数量。由于试图区分问题类别的监督分类器的预测准确性会随着无关和冗余特征的存在而下降，因此降维过程是必不可少的。我们提出应用基因选择过程，这也使生物学研究人员能够专注于在这些大规模微阵列中对分类有积极贡献的有前景的基因候选物。机器学习和模式识别文献中出现了两种基本的特征选择方法：过滤和包装技术。过滤程序在DNA微阵列领域的大多数工作中都有使用。在这项工作中，对一组不同的过滤指标和一个包装顺序搜索程序进行了比较。通过使用四个经典的监督分类器，在两个著名的DNA微阵列数据集上进行了比较。该研究是针对原始连续和三个区间离散化后的基因表达数据进行的。对于连续数据，提出了两个著名的过滤指标，而对于离散化数据，则使用了四个经典的过滤措施。连续和离散化数据都使用相同的包装方法。与非基因选择方法相比，过滤和包装基因选择程序的应用导致了显著更好的准确性结果，同时伴随着有趣且显著的降维。尽管包装方法主要表现出比过滤指标更准确的行为，但这种改进伴随着相当大的计算机负载需求。我们注意到，在离散和连续微阵列数据中，通过提出的过滤和包装程序选择的大多数基因都出现在先前对这些数据集的研究检测到的相关信息基因列表中。这项工作的目的是在DNA微阵列数据集的基因选择任务领域做出贡献。通过与更流行的过滤技术进行广泛比较，我们希望在这类领域中对包装方法的扩展和研究做出贡献。

相似文献

Filter versus wrapper gene selection approaches in DNA microarray domains.DNA微阵列领域中过滤法与包装法基因选择方法

Artif Intell Med. 2004 Jun;31(2):91-103. doi: 10.1016/j.artmed.2004.01.007.

A novel feature selection approach for biomedical data classification.一种用于生物医学数据分类的新特征选择方法。

J Biomed Inform. 2010 Feb;43(1):15-23. doi: 10.1016/j.jbi.2009.07.008. Epub 2009 Jul 30.

Wrapper filtering criteria via linear neuron and kernel approaches.通过线性神经元和核方法的包装器过滤标准。

Comput Biol Med. 2008 Aug;38(8):894-912. doi: 10.1016/j.compbiomed.2008.05.005. Epub 2008 Jul 24.

Gene selection from microarray data for cancer classification--a machine learning approach.基于机器学习方法从微阵列数据中进行癌症分类的基因选择

Comput Biol Chem. 2005 Feb;29(1):37-46. doi: 10.1016/j.compbiolchem.2004.11.001.

Tumor classification ranking from microarray data.基于微阵列数据的肿瘤分类排名

BMC Genomics. 2008 Sep 16;9 Suppl 2(Suppl 2):S21. doi: 10.1186/1471-2164-9-S2-S21.

Independent component analysis-based penalized discriminant method for tumor classification using gene expression data.基于独立成分分析的惩罚判别方法用于利用基因表达数据进行肿瘤分类

Bioinformatics. 2006 Aug 1;22(15):1855-62. doi: 10.1093/bioinformatics/btl190. Epub 2006 May 18.

The ties problem resulting from counting-based error estimators and its impact on gene selection algorithms.基于计数的误差估计器导致的关联问题及其对基因选择算法的影响。

Bioinformatics. 2006 Oct 15;22(20):2507-15. doi: 10.1093/bioinformatics/btl438. Epub 2006 Aug 14.

A GMM-IG framework for selecting genes as expression panel biomarkers.一种用于选择基因作为表达谱生物标志物的 GMM-IG 框架。

Artif Intell Med. 2010 Feb-Mar;48(2-3):75-82. doi: 10.1016/j.artmed.2009.07.006. Epub 2009 Dec 8.

Classification of microarray data with factor mixture models.基于因子混合模型的微阵列数据分类

Bioinformatics. 2006 Jan 15;22(2):202-8. doi: 10.1093/bioinformatics/bti779. Epub 2005 Nov 15.

Selecting a minimal number of relevant genes from microarray data to design accurate tissue classifiers.从微阵列数据中选择最少数量的相关基因以设计精确的组织分类器。

Biosystems. 2007 Jul-Aug;90(1):78-86. doi: 10.1016/j.biosystems.2006.07.002. Epub 2006 Jul 10.

引用本文的文献

Variable Selection for Multivariate Failure Time Data via Regularized Sparse-Input Neural Network.基于正则化稀疏输入神经网络的多变量失效时间数据变量选择

Bioengineering (Basel). 2025 May 31;12(6):596. doi: 10.3390/bioengineering12060596.

Machine learning prediction of malaria vaccine efficacy based on antibody profiles.基于抗体谱的疟疾疫苗效力的机器学习预测。

PLoS Comput Biol. 2024 Jun 7;20(6):e1012131. doi: 10.1371/journal.pcbi.1012131. eCollection 2024 Jun.

Antibody selection strategies and their impact in predicting clinical malaria based on multi-sera data.

DNA微阵列领域中过滤法与包装法基因选择方法

Filter versus wrapper gene selection approaches in DNA microarray domains.

作者信息

Inza Iñaki, Larrañaga Pedro, Blanco Rosa, Cerrolaza Antonio J

机构信息

Department of Computer Science and Artificial Intelligence, University of the Basque Country, P.O. Box 649, E-20080 Donostia-San Sebastián, Basque Country, Spain.

出版信息

Artif Intell Med. 2004 Jun;31(2):91-103. doi: 10.1016/j.artmed.2004.01.007.

DOI:10.1016/j.artmed.2004.01.007

PMID:15219288

Abstract

摘要

Suppr 超能文献

文献检索

文件翻译

深度研究

Suppr 超能文献

文献检索

文件翻译

深度研究

DNA微阵列领域中过滤法与包装法基因选择方法

Filter versus wrapper gene selection approaches in DNA microarray domains.

作者信息

机构信息

出版信息

相似文献

引用本文的文献

DNA微阵列领域中过滤法与包装法基因选择方法

Filter versus wrapper gene selection approaches in DNA microarray domains.

作者信息

机构信息

出版信息

相似文献

引用本文的文献