Suppr超能文献

用于基因组应用的特征选择环境。

Feature selection environment for genomic applications.

作者信息

Lopes Fabrício Martins, Martins David Corrêa, Cesar Roberto M

机构信息

Instituto de Matemática e Estatística, Universidade de São Paulo, São Paulo, SP, Brazil.

出版信息

BMC Bioinformatics. 2008 Oct 22;9:451. doi: 10.1186/1471-2105-9-451.

Abstract

BACKGROUND

Feature selection is a pattern recognition approach to choose important variables according to some criteria in order to distinguish or explain certain phenomena (i.e., for dimensionality reduction). There are many genomic and proteomic applications that rely on feature selection to answer questions such as selecting signature genes which are informative about some biological state, e.g., normal tissues and several types of cancer; or inferring a prediction network among elements such as genes, proteins and external stimuli. In these applications, a recurrent problem is the lack of samples to perform an adequate estimate of the joint probabilities between element states. A myriad of feature selection algorithms and criterion functions have been proposed, although it is difficult to point the best solution for each application.

RESULTS

The intent of this work is to provide an open-source multiplatform graphical environment for bioinformatics problems, which supports many feature selection algorithms, criterion functions and graphic visualization tools such as scatterplots, parallel coordinates and graphs. A feature selection approach for growing genetic networks from seed genes (targets or predictors) is also implemented in the system.

CONCLUSION

The proposed feature selection environment allows data analysis using several algorithms, criterion functions and graphic visualization tools. Our experiments have shown the software effectiveness in two distinct types of biological problems. Besides, the environment can be used in different pattern recognition applications, although the main concern regards bioinformatics tasks.

摘要

背景

特征选择是一种模式识别方法,可根据某些标准选择重要变量,以区分或解释特定现象(即用于降维)。有许多基因组和蛋白质组学应用依赖于特征选择来回答诸如选择关于某些生物状态(例如正常组织和几种癌症类型)的信息丰富的特征基因;或推断基因、蛋白质和外部刺激等元素之间的预测网络等问题。在这些应用中,一个反复出现的问题是缺乏样本,无法对元素状态之间的联合概率进行充分估计。尽管很难为每个应用指出最佳解决方案,但已经提出了无数的特征选择算法和准则函数。

结果

这项工作的目的是为生物信息学问题提供一个开源的多平台图形环境,该环境支持许多特征选择算法、准则函数以及诸如散点图、平行坐标和图形等图形可视化工具。系统中还实现了一种从种子基因(目标或预测因子)生长遗传网络的特征选择方法。

结论

所提出的特征选择环境允许使用多种算法、准则函数和图形可视化工具进行数据分析。我们的实验表明该软件在两种不同类型的生物学问题中有效。此外,尽管主要关注生物信息学任务,但该环境可用于不同的模式识别应用。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/01c5/2655091/0f83c9d782f3/1471-2105-9-451-1.jpg

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验