Suppr超能文献

一种应对基因选择过程中不稳定性的集成框架。

An Ensemble Framework Coping with Instability in the Gene Selection Process.

机构信息

IBSAL/BISITE Research Group, University of Salamanca, Edificio I+D+i, 37007, Salamanca, Spain.

CISUC, ECOS Research Group, University of Coimbra, Pólo II-Pinhal de Marrocos, 3030-290, Coimbra, Portugal.

出版信息

Interdiscip Sci. 2018 Mar;10(1):12-23. doi: 10.1007/s12539-017-0274-z. Epub 2018 Jan 8.

Abstract

This paper proposes an ensemble framework for gene selection, which is aimed at addressing instability problems presented in the gene filtering task. The complex process of gene selection from gene expression data faces different instability problems from the informative gene subsets found by different filter methods. This makes the identification of significant genes by the experts difficult. The instability of results can come from filter methods, gene classifier methods, different datasets of the same disease and multiple valid groups of biomarkers. Even though there is a wide number of proposals, the complexity imposed by this problem remains a challenge today. This work proposes a framework involving five stages of gene filtering to discover biomarkers for diagnosis and classification tasks. This framework performs a process of stable feature selection, facing the problems above and, thus, providing a more suitable and reliable solution for clinical and research purposes. Our proposal involves a process of multistage gene filtering, in which several ensemble strategies for gene selection were added in such a way that different classifiers simultaneously assess gene subsets to face instability. Firstly, we apply an ensemble of recent gene selection methods to obtain diversity in the genes found (stability according to filter methods). Next, we apply an ensemble of known classifiers to filter genes relevant to all classifiers at a time (stability according to classification methods). The achieved results were evaluated in two different datasets of the same disease (pancreatic ductal adenocarcinoma), in search of stability according to the disease, for which promising results were achieved.

摘要

本文提出了一种用于基因选择的集成框架,旨在解决基因过滤任务中出现的不稳定性问题。从基因表达数据中选择基因的复杂过程与不同过滤方法找到的信息性基因子集所面临的不稳定性问题不同。这使得专家难以识别重要基因。结果的不稳定性可能来自于过滤方法、基因分类器方法、同一疾病的不同数据集和多个有效的生物标志物组。尽管有很多建议,但这个问题的复杂性仍然是一个挑战。本工作提出了一个涉及基因过滤五个阶段的框架,以发现用于诊断和分类任务的生物标志物。该框架执行一个稳定的特征选择过程,面对上述问题,从而为临床和研究目的提供更合适和可靠的解决方案。我们的提案涉及一个多阶段基因过滤过程,其中添加了几种用于基因选择的集成策略,以便不同的分类器同时评估基因子集以应对不稳定性。首先,我们应用一组最新的基因选择方法来获得所发现基因的多样性(根据过滤方法的稳定性)。接下来,我们应用一组已知的分类器来一次过滤与所有分类器相关的基因(根据分类方法的稳定性)。在同一疾病(胰腺导管腺癌)的两个不同数据集上评估了所获得的结果,以寻找针对该疾病的稳定性,取得了有希望的结果。

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验