基于集成特征选择方法的癌症诊断稳健生物标志物识别。

Robust biomarker identification for cancer diagnosis with ensemble feature selection methods.

机构信息

Department of Plant Systems Biology, VIB, Technologiepark 927, 9052 Gent, Belgium.

出版信息

Bioinformatics. 2010 Feb 1;26(3):392-8. doi: 10.1093/bioinformatics/btp630. Epub 2009 Nov 25.

DOI:10.1093/bioinformatics/btp630

PMID:19942583

Abstract

MOTIVATION

Biomarker discovery is an important topic in biomedical applications of computational biology, including applications such as gene and SNP selection from high-dimensional data. Surprisingly, the stability with respect to sampling variation or robustness of such selection processes has received attention only recently. However, robustness of biomarkers is an important issue, as it may greatly influence subsequent biological validations. In addition, a more robust set of markers may strengthen the confidence of an expert in the results of a selection method.

RESULTS

Our first contribution is a general framework for the analysis of the robustness of a biomarker selection algorithm. Secondly, we conducted a large-scale analysis of the recently introduced concept of ensemble feature selection, where multiple feature selections are combined in order to increase the robustness of the final set of selected features. We focus on selection methods that are embedded in the estimation of support vector machines (SVMs). SVMs are powerful classification models that have shown state-of-the-art performance on several diagnosis and prognosis tasks on biological data. Their feature selection extensions also offered good results for gene selection tasks. We show that the robustness of SVMs for biomarker discovery can be substantially increased by using ensemble feature selection techniques, while at the same time improving upon classification performances. The proposed methodology is evaluated on four microarray datasets showing increases of up to almost 30% in robustness of the selected biomarkers, along with an improvement of approximately 15% in classification performance. The stability improvement with ensemble methods is particularly noticeable for small signature sizes (a few tens of genes), which is most relevant for the design of a diagnosis or prognosis model from a gene signature.

SUPPLEMENTARY INFORMATION

Supplementary data are available at Bioinformatics online.

摘要

动机

生物标志物的发现是计算生物学在生物医学应用中的一个重要课题，包括从高维数据中选择基因和 SNP 等应用。令人惊讶的是，这种选择过程的稳定性或稳健性最近才受到关注。然而，生物标志物的稳健性是一个重要的问题，因为它可能会极大地影响后续的生物学验证。此外，更稳健的一组标记物可能会增强专家对选择方法结果的信心。

结果

我们的第一个贡献是一个用于分析生物标志物选择算法稳健性的通用框架。其次，我们对最近提出的集成特征选择概念进行了大规模分析，其中将多个特征选择组合在一起以提高最终选择特征集的稳健性。我们专注于嵌入支持向量机 (SVM) 估计的选择方法。SVM 是一种强大的分类模型，在生物数据的几个诊断和预后任务上表现出了最先进的性能。它们的特征选择扩展也为基因选择任务提供了很好的结果。我们表明，通过使用集成特征选择技术，可以大大提高 SVM 在生物标志物发现中的稳健性，同时提高分类性能。所提出的方法在四个微阵列数据集上进行了评估，结果显示所选生物标志物的稳健性提高了近 30%，分类性能提高了约 15%。在小特征集（几十种基因）的情况下，与集成方法的稳定性提高尤为明显，这对于从基因特征设计诊断或预后模型最相关。

补充信息

补充数据可在生物信息学在线获得。

相似文献

Robust biomarker identification for cancer diagnosis with ensemble feature selection methods.

Bioinformatics. 2010 Feb 1;26(3):392-8. doi: 10.1093/bioinformatics/btp630. Epub 2009 Nov 25.

Robust and efficient identification of biomarkers by classifying features on graphs.

Bioinformatics. 2008 Sep 15;24(18):2023-9. doi: 10.1093/bioinformatics/btn383. Epub 2008 Jul 24.

Gene selection via the BAHSIC family of algorithms.

Bioinformatics. 2007 Jul 1;23(13):i490-8. doi: 10.1093/bioinformatics/btm216.

Gene selection in cancer classification using sparse logistic regression with Bayesian regularization.

Bioinformatics. 2006 Oct 1;22(19):2348-55. doi: 10.1093/bioinformatics/btl386. Epub 2006 Jul 14.

Compact cancer biomarkers discovery using a swarm intelligence feature selection algorithm.

Comput Biol Chem. 2010 Aug;34(4):244-50. doi: 10.1016/j.compbiolchem.2010.08.003. Epub 2010 Sep 9.

Robustness of chemometrics-based feature selection methods in early cancer detection and biomarker discovery.

Stat Appl Genet Mol Biol. 2013 Mar 13;12(2):207-23. doi: 10.1515/sagmb-2012-0067.

Recursive gene selection based on maximum margin criterion: a comparison with SVM-RFE.

BMC Bioinformatics. 2006 Dec 25;7:543. doi: 10.1186/1471-2105-7-543.

A combination of rough-based feature selection and RBF neural network for classification using gene expression data.

IEEE Trans Nanobioscience. 2008 Mar;7(1):91-9. doi: 10.1109/TNB.2008.2000142.

ESVM: evolutionary support vector machine for automatic feature selection and classification of microarray data.

Biosystems. 2007 Sep-Oct;90(2):516-28. doi: 10.1016/j.biosystems.2006.12.003. Epub 2006 Dec 16.

Clustering threshold gradient descent regularization: with applications to microarray studies.

Bioinformatics. 2007 Feb 15;23(4):466-72. doi: 10.1093/bioinformatics/btl632. Epub 2006 Dec 20.

引用本文的文献

A Hybrid Ensemble Equilibrium Optimizer Gene Selection Algorithm for Microarray Data.

Biomimetics (Basel). 2025 Aug 10;10(8):523. doi: 10.3390/biomimetics10080523.

Hybrid classical and quantum computing for enhanced glioma tumor classification using TCGA data.

Sci Rep. 2025 Jul 17;15(1):25935. doi: 10.1038/s41598-025-97067-3.

Prediction of glycaemic control and quality of life in people with type 2 diabetes using glucose-lowering drugs with machine learning-The Maastricht study.

Diabetes Obes Metab. 2025 Oct;27(10):5524-5537. doi: 10.1111/dom.16598. Epub 2025 Jul 17.

Auxiliary diagnosis of primary bone tumors based on Machine learning model.

J Bone Oncol. 2024 Nov 9;49:100648. doi: 10.1016/j.jbo.2024.100648. eCollection 2024 Dec.

Stable multivariate lesion symptom mapping.

Apert Neuro. 2024;4. doi: 10.52294/001c.117311. Epub 2024 Jun 7.

The Mclust Analysis of Tumor Budding Unveils the Role of the Collagen Family in Cervical Cancer Progression.

Life (Basel). 2024 Aug 13;14(8):1004. doi: 10.3390/life14081004.

Artificial intelligence methods available for cancer research.

Front Med. 2024 Oct;18(5):778-797. doi: 10.1007/s11684-024-1085-3. Epub 2024 Aug 8.

Discrimination of Etiologically Different Cholestasis by Modeling Proteomics Datasets.

Int J Mol Sci. 2024 Mar 26;25(7):3684. doi: 10.3390/ijms25073684.

Ensemble learning for integrative prediction of genetic values with genomic variants.

BMC Bioinformatics. 2024 Mar 21;25(1):120. doi: 10.1186/s12859-024-05720-x.

Databases and computational methods for the identification of piRNA-related molecules: A survey.

Comput Struct Biotechnol J. 2024 Jan 22;23:813-833. doi: 10.1016/j.csbj.2024.01.011. eCollection 2024 Dec.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

基于集成特征选择方法的癌症诊断稳健生物标志物识别。

Robust biomarker identification for cancer diagnosis with ensemble feature selection methods.

机构信息

出版信息

MOTIVATION

RESULTS

SUPPLEMENTARY INFORMATION

动机

结果

补充信息

相似文献

引用本文的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

相似文献

引用本文的文献