Suppr超能文献

基于k近邻集成的基因表达癌症分类中的数据集复杂性

Dataset complexity in gene expression based cancer classification using ensembles of k-nearest neighbors.

作者信息

Okun Oleg, Priisalu Helen

机构信息

University of Oulu, Department of Electrical and Information Engineering, P.O. Box 4500, Oulu 90014, Finland.

出版信息

Artif Intell Med. 2009 Feb-Mar;45(2-3):151-62. doi: 10.1016/j.artmed.2008.08.004. Epub 2008 Sep 14.

Abstract

OBJECTIVE

We explore the link between dataset complexity, determining how difficult a dataset is for classification, and classification performance defined by low-variance and low-biased bolstered resubstitution error made by k-nearest neighbor classifiers.

METHODS AND MATERIAL

Gene expression based cancer classification is used as the task in this study. Six gene expression datasets containing different types of cancer constitute test data.

RESULTS

Through extensive simulation coupled with the copula method for analysis of association in bivariate data, we show that dataset complexity and bolstered resubstitution error are associated in terms of dependence. As a result, we propose a new scheme for generating ensembles of classifiers that selects subsets of features of low complexity for ensemble members, which constitutes the accurate members according to the found dependence relation.

CONCLUSION

Experiments with six gene expression datasets demonstrate that our ensemble generating scheme based on the dependence of dataset complexity and classification error is superior to a single best classifier in the ensemble and to the traditional ensemble construction scheme that is ignorant of dataset complexity.

摘要

目的

我们探究数据集复杂性(确定一个数据集对于分类的难度)与由k近邻分类器产生的低方差和低偏差增强重替代误差所定义的分类性能之间的联系。

方法和材料

基于基因表达的癌症分类被用作本研究中的任务。六个包含不同类型癌症的基因表达数据集构成测试数据。

结果

通过广泛的模拟以及用于分析双变量数据中关联的copula方法,我们表明数据集复杂性和增强重替代误差在依赖性方面是相关的。因此,我们提出了一种新的生成分类器集成的方案,该方案为集成成员选择低复杂性的特征子集,这些子集根据所发现的依赖关系构成准确的成员。

结论

对六个基因表达数据集的实验表明,我们基于数据集复杂性和分类误差依赖性的集成生成方案优于集成中的单个最佳分类器以及忽略数据集复杂性的传统集成构建方案。

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验