• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

使用增强误差估计对小样本进行卓越的特征集排序。

Superior feature-set ranking for small samples using bolstered error estimation.

作者信息

Sima Chao, Braga-Neto Ulisses, Dougherty Edward R

机构信息

Department of Electrical Engineering, Texas A&M University College Station, TX, USA.

出版信息

Bioinformatics. 2005 Apr 1;21(7):1046-54. doi: 10.1093/bioinformatics/bti081. Epub 2004 Oct 28.

DOI:10.1093/bioinformatics/bti081
PMID:15514003
Abstract

MOTIVATION

Ranking feature sets is a key issue for classification, for instance, phenotype classification based on gene expression. Since ranking is often based on error estimation, and error estimators suffer to differing degrees of imprecision in small-sample settings, it is important to choose a computationally feasible error estimator that yields good feature-set ranking.

RESULTS

This paper examines the feature-ranking performance of several kinds of error estimators: resubstitution, cross-validation, bootstrap and bolstered error estimation. It does so for three classification rules: linear discriminant analysis, three-nearest-neighbor classification and classification trees. Two measures of performance are considered. One counts the number of the truly best feature sets appearing among the best feature sets discovered by the error estimator and the other computes the mean absolute error between the top ranks of the truly best feature sets and their ranks as given by the error estimator. Our results indicate that bolstering is superior to bootstrap, and bootstrap is better than cross-validation, for discovering top-performing feature sets for classification when using small samples. A key issue is that bolstered error estimation is tens of times faster than bootstrap, and faster than cross-validation, and is therefore feasible for feature-set ranking when the number of feature sets is extremely large.

摘要

动机

对特征集进行排序是分类中的一个关键问题,例如基于基因表达的表型分类。由于排序通常基于误差估计,并且在小样本情况下误差估计器会受到不同程度的不精确性影响,因此选择一个计算可行且能产生良好特征集排序的误差估计器非常重要。

结果

本文研究了几种误差估计器的特征排序性能:再代入法、交叉验证法、自助法和增强误差估计法。针对三种分类规则进行了研究:线性判别分析、三近邻分类法和分类树。考虑了两种性能度量。一种是计算在误差估计器发现的最佳特征集中出现的真正最佳特征集的数量,另一种是计算真正最佳特征集的最高排名与其在误差估计器给出的排名之间的平均绝对误差。我们的结果表明,在使用小样本进行分类时发现顶级性能特征集方面,增强法优于自助法,自助法优于交叉验证法。一个关键问题是,增强误差估计比自助法快数十倍,比交叉验证法也快,因此当特征集数量极大时,对于特征集排序是可行的。

相似文献

1
Superior feature-set ranking for small samples using bolstered error estimation.使用增强误差估计对小样本进行卓越的特征集排序。
Bioinformatics. 2005 Apr 1;21(7):1046-54. doi: 10.1093/bioinformatics/bti081. Epub 2004 Oct 28.
2
What should be expected from feature selection in small-sample settings.在小样本情况下,特征选择应达到什么预期效果。
Bioinformatics. 2006 Oct 1;22(19):2430-6. doi: 10.1093/bioinformatics/btl407. Epub 2006 Jul 26.
3
Estimating misclassification error with small samples via bootstrap cross-validation.通过自助法交叉验证用小样本估计误分类误差。
Bioinformatics. 2005 May 1;21(9):1979-86. doi: 10.1093/bioinformatics/bti294. Epub 2005 Feb 2.
4
Is cross-validation better than resubstitution for ranking genes?在对基因进行排名时,交叉验证是否比重替代法更好?
Bioinformatics. 2004 Jan 22;20(2):253-8. doi: 10.1093/bioinformatics/btg399.
5
Optimal number of features as a function of sample size for various classification rules.针对各种分类规则,作为样本大小函数的最优特征数量。
Bioinformatics. 2005 Apr 15;21(8):1509-15. doi: 10.1093/bioinformatics/bti171. Epub 2004 Nov 30.
6
Reporting bias when using real data sets to analyze classification performance.使用真实数据集分析分类性能时的报告偏倚。
Bioinformatics. 2010 Jan 1;26(1):68-76. doi: 10.1093/bioinformatics/btp605. Epub 2009 Oct 21.
7
Prediction error estimation: a comparison of resampling methods.预测误差估计:重采样方法的比较
Bioinformatics. 2005 Aug 1;21(15):3301-7. doi: 10.1093/bioinformatics/bti499. Epub 2005 May 19.
8
Genetic test bed for feature selection.用于特征选择的基因测试平台。
Bioinformatics. 2006 Apr 1;22(7):837-42. doi: 10.1093/bioinformatics/btl008. Epub 2006 Jan 20.
9
The ties problem resulting from counting-based error estimators and its impact on gene selection algorithms.基于计数的误差估计器导致的关联问题及其对基因选择算法的影响。
Bioinformatics. 2006 Oct 15;22(20):2507-15. doi: 10.1093/bioinformatics/btl438. Epub 2006 Aug 14.
10
Is cross-validation valid for small-sample microarray classification?交叉验证对小样本微阵列分类是否有效?
Bioinformatics. 2004 Feb 12;20(3):374-80. doi: 10.1093/bioinformatics/btg419.

引用本文的文献

1
The Model-Based Study of the Effectiveness of Reporting Lists of Small Feature Sets Using RNA-Seq Data.基于模型的使用RNA测序数据报告小特征集列表有效性的研究
Cancer Inform. 2017 Jun 12;16:1176935117710530. doi: 10.1177/1176935117710530. eCollection 2017.
2
PoplarGene: poplar gene network and resource for mining functional information for genes from woody plants.杨树基因库:杨树基因网络及用于挖掘木本植物基因功能信息的资源。
Sci Rep. 2016 Aug 12;6:31356. doi: 10.1038/srep31356.
3
High-dimensional bolstered error estimation.高维增强误差估计。
Bioinformatics. 2011 Nov 1;27(21):3056-64. doi: 10.1093/bioinformatics/btr518. Epub 2011 Sep 13.
4
Characterization of the effectiveness of reporting lists of small feature sets relative to the accuracy of the prior biological knowledge.相对于先前生物知识的准确性,对小特征集报告列表的有效性进行表征。
Cancer Inform. 2010 Mar 18;9:49-60. doi: 10.4137/cin.s4020.
5
Gene expression profiling during early acute febrile stage of dengue infection can predict the disease outcome.在登革热感染的早期急性发热阶段进行基因表达谱分析可以预测疾病的结局。
PLoS One. 2009 Nov 19;4(11):e7892. doi: 10.1371/journal.pone.0007892.
6
Validation of computational methods in genomics.基因组学中计算方法的验证。
Curr Genomics. 2007 Mar;8(1):1-19. doi: 10.2174/138920207780076956.
7
Decorrelation of the true and estimated classifier errors in high-dimensional settings.高维环境下真实分类器误差与估计分类器误差的去相关。
EURASIP J Bioinform Syst Biol. 2007;2007(1):38473. doi: 10.1155/2007/38473.
8
An improved, bias-reduced probabilistic functional gene network of baker's yeast, Saccharomyces cerevisiae.一种经过改进、偏差降低的酿酒酵母概率功能基因网络。
PLoS One. 2007 Oct 3;2(10):e988. doi: 10.1371/journal.pone.0000988.
9
Quantification of the impact of feature selection on the variance of cross-validation error estimation.特征选择对交叉验证误差估计方差影响的量化。
EURASIP J Bioinform Syst Biol. 2007;2007(1):16354. doi: 10.1155/2007/16354.
10
Prognostic testing in uveal melanoma by transcriptomic profiling of fine needle biopsy specimens.通过细针穿刺活检标本的转录组分析进行葡萄膜黑色素瘤的预后检测。
J Mol Diagn. 2006 Nov;8(5):567-73. doi: 10.2353/jmoldx.2006.060077.