• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

用于特征选择的基因测试平台。

Genetic test bed for feature selection.

作者信息

Choudhary Ashish, Brun Marcel, Hua Jianping, Lowey James, Suh Ed, Dougherty Edward R

机构信息

Department of Electrical and Computer Engineering, Texas A&M University, College Station, TX 77843, USA.

出版信息

Bioinformatics. 2006 Apr 1;22(7):837-42. doi: 10.1093/bioinformatics/btl008. Epub 2006 Jan 20.

DOI:10.1093/bioinformatics/btl008
PMID:16428263
Abstract

MOTIVATION

Given a large set of potential features, such as the set of all gene-expression values from a microarray, it is necessary to find a small subset with which to classify. The task of finding an optimal feature set of a given size is inherently combinatoric because to assure optimality all feature sets of a given size must be checked. Thus, numerous suboptimal feature-selection algorithms have been proposed. There are strong impediments to evaluate feature-selection algorithms using real data when data are limited, a common situation in genetic classification. The difficulty is compound. First, there are no class-conditional distributions from which to draw data points, only a single small labeled sample. Second, there are no test data with which to estimate the feature-set errors, and one must depend on a training-data-based error estimator. Finally, there is no optimal feature set with which to compare the feature sets found by the algorithms.

RESULTS

This paper describes a genetic test bed for the evaluation of feature-selection algorithms. It begins with a large biological feature-label dataset that is used as an empirical distribution and, using massively parallel computation, finds the top feature sets of various sizes based on a given sample size and classification rule. The user can draw random samples from the data, apply a proposed algorithm, and evaluate the proficiency of the proposed algorithm via three different measures (code provided). A key feature of the test bed is that, once a dataset is input, a single command creates the entire test bed relative to the dataset. The particular dataset used for the first version of the test bed comes from a microarray-based classification study that analyzes a large number of microarrays, prepared with RNA from breast tumor samples from each of 295 patients.

AVAILABILITY

The software and supplementary material are available at http://public.tgen.org/tgen-cb/support/testbed/

CONTACT

edward@ece.tamu.edu.

摘要

动机

给定大量潜在特征,比如来自微阵列的所有基因表达值集合,有必要找到一个用于分类的小子集。寻找给定大小的最优特征集的任务本质上是组合性的,因为为确保最优性,必须检查给定大小的所有特征集。因此,人们提出了许多次优的特征选择算法。当数据有限时(这在基因分类中是常见情况),使用真实数据评估特征选择算法存在很大障碍。困难是多方面的。首先,没有类条件分布可从中抽取数据点,只有一个小的带标签样本。其次,没有测试数据来估计特征集误差,必须依赖基于训练数据的误差估计器。最后,没有最优特征集可用于比较算法找到的特征集。

结果

本文描述了一个用于评估特征选择算法的基因测试平台。它从一个大型生物特征 - 标签数据集开始,该数据集用作经验分布,并利用大规模并行计算,基于给定样本大小和分类规则找到各种大小的顶级特征集。用户可以从数据中抽取随机样本,应用所提出的算法,并通过三种不同度量(提供了代码)评估所提出算法的熟练度。该测试平台的一个关键特性是,一旦输入一个数据集,一个命令就会创建相对于该数据集的整个测试平台。用于测试平台第一个版本的特定数据集来自一项基于微阵列的分类研究,该研究分析了大量微阵列,这些微阵列是用来自295名患者中每一位的乳腺肿瘤样本的RNA制备的。

可用性

软件和补充材料可在http://public.tgen.org/tgen - cb/support/testbed/获取。

联系方式

edward@ece.tamu.edu。

相似文献

1
Genetic test bed for feature selection.用于特征选择的基因测试平台。
Bioinformatics. 2006 Apr 1;22(7):837-42. doi: 10.1093/bioinformatics/btl008. Epub 2006 Jan 20.
2
Optimal number of features as a function of sample size for various classification rules.针对各种分类规则,作为样本大小函数的最优特征数量。
Bioinformatics. 2005 Apr 15;21(8):1509-15. doi: 10.1093/bioinformatics/bti171. Epub 2004 Nov 30.
3
What should be expected from feature selection in small-sample settings.在小样本情况下,特征选择应达到什么预期效果。
Bioinformatics. 2006 Oct 1;22(19):2430-6. doi: 10.1093/bioinformatics/btl407. Epub 2006 Jul 26.
4
The ties problem resulting from counting-based error estimators and its impact on gene selection algorithms.基于计数的误差估计器导致的关联问题及其对基因选择算法的影响。
Bioinformatics. 2006 Oct 15;22(20):2507-15. doi: 10.1093/bioinformatics/btl438. Epub 2006 Aug 14.
5
Reporting bias when using real data sets to analyze classification performance.使用真实数据集分析分类性能时的报告偏倚。
Bioinformatics. 2010 Jan 1;26(1):68-76. doi: 10.1093/bioinformatics/btp605. Epub 2009 Oct 21.
6
Classification of microarray data with factor mixture models.基于因子混合模型的微阵列数据分类
Bioinformatics. 2006 Jan 15;22(2):202-8. doi: 10.1093/bioinformatics/bti779. Epub 2005 Nov 15.
7
Classification based upon gene expression data: bias and precision of error rates.基于基因表达数据的分类:错误率的偏差与精度
Bioinformatics. 2007 Jun 1;23(11):1363-70. doi: 10.1093/bioinformatics/btm117. Epub 2007 Mar 28.
8
Practical FDR-based sample size calculations in microarray experiments.微阵列实验中基于实际错误发现率的样本量计算
Bioinformatics. 2005 Aug 1;21(15):3264-72. doi: 10.1093/bioinformatics/bti519. Epub 2005 Jun 2.
9
Reliable gene signatures for microarray classification: assessment of stability and performance.用于微阵列分类的可靠基因特征:稳定性和性能评估
Bioinformatics. 2006 Oct 1;22(19):2356-63. doi: 10.1093/bioinformatics/btl400. Epub 2006 Jul 31.
10
SoFoCles: feature filtering for microarray classification based on gene ontology.SoFoCles:基于基因本体论的微阵列分类特征过滤。
J Biomed Inform. 2010 Feb;43(1):1-14. doi: 10.1016/j.jbi.2009.06.002. Epub 2009 Jul 1.

引用本文的文献

1
Gene selection for cancer classification with the help of bees.借助蜜蜂进行癌症分类的基因选择
BMC Med Genomics. 2016 Aug 10;9 Suppl 2(Suppl 2):47. doi: 10.1186/s12920-016-0204-7.
2
An algorithm for finding biologically significant features in microarray data based on a priori manifold learning.一种基于先验流形学习在微阵列数据中寻找生物学显著特征的算法。
PLoS One. 2014 Mar 3;9(3):e90562. doi: 10.1371/journal.pone.0090562. eCollection 2014.
3
A hybrid BPSO-CGA approach for gene selection and classification of microarray data.
一种用于基因选择和微阵列数据分类的混合BPSO-CGA方法。
J Comput Biol. 2012 Jan;19(1):68-82. doi: 10.1089/cmb.2010.0064. Epub 2011 Jan 6.
4
Performance of feature selection methods.特征选择方法的性能。
Curr Genomics. 2009 Sep;10(6):365-74. doi: 10.2174/138920209789177629.
5
Merging microarray data, robust feature selection, and predicting prognosis in prostate cancer.整合微阵列数据、稳健特征选择及预测前列腺癌预后
Cancer Inform. 2007 Feb 14;2:87-97.
6
MIST: Maximum Information Spanning Trees for dimension reduction of biological data sets.MIST:用于生物数据集降维的最大信息生成树
Bioinformatics. 2009 May 1;25(9):1165-72. doi: 10.1093/bioinformatics/btp109. Epub 2009 Mar 4.
7
Validation of computational methods in genomics.基因组学中计算方法的验证。
Curr Genomics. 2007 Mar;8(1):1-19. doi: 10.2174/138920207780076956.
8
Which is better: holdout or full-sample classifier design?哪种方法更好:留出法还是全样本分类器设计?
EURASIP J Bioinform Syst Biol. 2008;2008(1):297945. doi: 10.1155/2008/297945.
9
Quantification of the impact of feature selection on the variance of cross-validation error estimation.特征选择对交叉验证误差估计方差影响的量化。
EURASIP J Bioinform Syst Biol. 2007;2007(1):16354. doi: 10.1155/2007/16354.