Suppr超能文献

基于集成特征选择方法的癌症诊断稳健生物标志物识别。

Robust biomarker identification for cancer diagnosis with ensemble feature selection methods.

机构信息

Department of Plant Systems Biology, VIB, Technologiepark 927, 9052 Gent, Belgium.

出版信息

Bioinformatics. 2010 Feb 1;26(3):392-8. doi: 10.1093/bioinformatics/btp630. Epub 2009 Nov 25.

Abstract

MOTIVATION

Biomarker discovery is an important topic in biomedical applications of computational biology, including applications such as gene and SNP selection from high-dimensional data. Surprisingly, the stability with respect to sampling variation or robustness of such selection processes has received attention only recently. However, robustness of biomarkers is an important issue, as it may greatly influence subsequent biological validations. In addition, a more robust set of markers may strengthen the confidence of an expert in the results of a selection method.

RESULTS

Our first contribution is a general framework for the analysis of the robustness of a biomarker selection algorithm. Secondly, we conducted a large-scale analysis of the recently introduced concept of ensemble feature selection, where multiple feature selections are combined in order to increase the robustness of the final set of selected features. We focus on selection methods that are embedded in the estimation of support vector machines (SVMs). SVMs are powerful classification models that have shown state-of-the-art performance on several diagnosis and prognosis tasks on biological data. Their feature selection extensions also offered good results for gene selection tasks. We show that the robustness of SVMs for biomarker discovery can be substantially increased by using ensemble feature selection techniques, while at the same time improving upon classification performances. The proposed methodology is evaluated on four microarray datasets showing increases of up to almost 30% in robustness of the selected biomarkers, along with an improvement of approximately 15% in classification performance. The stability improvement with ensemble methods is particularly noticeable for small signature sizes (a few tens of genes), which is most relevant for the design of a diagnosis or prognosis model from a gene signature.

SUPPLEMENTARY INFORMATION

Supplementary data are available at Bioinformatics online.

摘要

动机

生物标志物的发现是计算生物学在生物医学应用中的一个重要课题,包括从高维数据中选择基因和 SNP 等应用。令人惊讶的是,这种选择过程的稳定性或稳健性最近才受到关注。然而,生物标志物的稳健性是一个重要的问题,因为它可能会极大地影响后续的生物学验证。此外,更稳健的一组标记物可能会增强专家对选择方法结果的信心。

结果

我们的第一个贡献是一个用于分析生物标志物选择算法稳健性的通用框架。其次,我们对最近提出的集成特征选择概念进行了大规模分析,其中将多个特征选择组合在一起以提高最终选择特征集的稳健性。我们专注于嵌入支持向量机 (SVM) 估计的选择方法。SVM 是一种强大的分类模型,在生物数据的几个诊断和预后任务上表现出了最先进的性能。它们的特征选择扩展也为基因选择任务提供了很好的结果。我们表明,通过使用集成特征选择技术,可以大大提高 SVM 在生物标志物发现中的稳健性,同时提高分类性能。所提出的方法在四个微阵列数据集上进行了评估,结果显示所选生物标志物的稳健性提高了近 30%,分类性能提高了约 15%。在小特征集(几十种基因)的情况下,与集成方法的稳定性提高尤为明显,这对于从基因特征设计诊断或预后模型最相关。

补充信息

补充数据可在生物信息学在线获得。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验