• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

基于类别重叠的新数据集评估方法。

A new dataset evaluation method based on category overlap.

机构信息

Department of Nanobiomedical Science, Dankook University, Cheonan 330-714, Republic of Korea.

出版信息

Comput Biol Med. 2011 Feb;41(2):115-22. doi: 10.1016/j.compbiomed.2010.12.006. Epub 2011 Jan 8.

DOI:10.1016/j.compbiomed.2010.12.006
PMID:21216397
Abstract

The quality of dataset has a profound effect on classification accuracy, and there is a clear need for some method to evaluate this quality. In this paper, we propose a new dataset evaluation method using the R-value measure. This proposed method is based on the ratio of overlapping areas among categories in a dataset. A high R-value for a dataset indicates that the dataset contains wide overlapping areas among its categories, and classification accuracy on the dataset may become low. We can use the R-value measure to understand the characteristics of a dataset, the feature selection process, and the proper design of new classifiers.

摘要

数据集的质量对分类精度有深远的影响,因此显然需要某种方法来评估其质量。在本文中,我们提出了一种使用 R 值度量的新数据集评估方法。该方法基于数据集的类别之间重叠区域的比例。数据集的 R 值较高表示其类别之间存在广泛的重叠区域,因此在该数据集上的分类精度可能会较低。我们可以使用 R 值度量来了解数据集的特征、特征选择过程以及新分类器的合理设计。

相似文献

1
A new dataset evaluation method based on category overlap.基于类别重叠的新数据集评估方法。
Comput Biol Med. 2011 Feb;41(2):115-22. doi: 10.1016/j.compbiomed.2010.12.006. Epub 2011 Jan 8.
2
RFS: efficient feature selection method based on R-value.RFS:基于 R 值的有效特征选择方法。
Comput Biol Med. 2013 Feb;43(2):91-9. doi: 10.1016/j.compbiomed.2012.11.010. Epub 2012 Dec 20.
3
Protein classification based on text document classification techniques.基于文本文档分类技术的蛋白质分类。
Proteins. 2005 Mar 1;58(4):955-70. doi: 10.1002/prot.20373.
4
A novel feature selection approach for biomedical data classification.一种用于生物医学数据分类的新特征选择方法。
J Biomed Inform. 2010 Feb;43(1):15-23. doi: 10.1016/j.jbi.2009.07.008. Epub 2009 Jul 30.
5
Statistical geometry based prediction of nonsynonymous SNP functional effects using random forest and neuro-fuzzy classifiers.基于统计几何学,使用随机森林和神经模糊分类器预测非同义单核苷酸多态性的功能效应
Proteins. 2008 Jun;71(4):1930-9. doi: 10.1002/prot.21838.
6
Improving the classification of multiple disorders with problem decomposition.通过问题分解改善多种疾病的分类。
J Biomed Inform. 2006 Dec;39(6):612-25. doi: 10.1016/j.jbi.2005.12.001. Epub 2006 Jan 18.
7
Fuzzy-rough supervised attribute clustering algorithm and classification of microarray data.模糊粗糙监督属性聚类算法与微阵列数据分类
IEEE Trans Syst Man Cybern B Cybern. 2011 Feb;41(1):222-33. doi: 10.1109/TSMCB.2010.2050684. Epub 2010 Jun 10.
8
Integrated feature and parameter optimization for an evolving spiking neural network: exploring heterogeneous probabilistic models.用于进化脉冲神经网络的集成特征与参数优化:探索异构概率模型。
Neural Netw. 2009 Jul-Aug;22(5-6):623-32. doi: 10.1016/j.neunet.2009.06.038. Epub 2009 Jul 2.
9
Bayesian neural network approaches to ovarian cancer identification from high-resolution mass spectrometry data.基于贝叶斯神经网络的从高分辨率质谱数据中识别卵巢癌的方法。
Bioinformatics. 2005 Jun;21 Suppl 1:i487-94. doi: 10.1093/bioinformatics/bti1030.
10
Contourlet-based mammography mass classification using the SVM family.基于 Contourlet 的支持向量机族在乳腺肿块分类中的应用
Comput Biol Med. 2010 Apr;40(4):373-83. doi: 10.1016/j.compbiomed.2009.12.006. Epub 2010 Feb 23.

引用本文的文献

1
DREAMER: a computational framework to evaluate readiness of datasets for machine learning.DREAMER:一个用于评估数据集是否适用于机器学习的计算框架。
BMC Med Inform Decis Mak. 2024 Jun 4;24(1):152. doi: 10.1186/s12911-024-02544-w.
2
Mapping of morpho-electric features to molecular identity of cortical inhibitory neurons.皮质抑制性神经元形态-电特征与分子特征的映射。
PLoS Comput Biol. 2023 Jan 5;19(1):e1010058. doi: 10.1371/journal.pcbi.1010058. eCollection 2023 Jan.
3
Feature Ranking and Screening for Class-Imbalanced Metabolomics Data Based on Rank Aggregation Coupled with Re-Balance.
基于秩聚合与再平衡的类不平衡代谢组学数据特征排序与筛选
Metabolites. 2021 Jun 14;11(6):389. doi: 10.3390/metabo11060389.
4
A machine learning approach for specification of spinal cord injuries using fractional anisotropy values obtained from diffusion tensor images.一种使用从扩散张量图像获得的分数各向异性值来确定脊髓损伤的机器学习方法。
Comput Math Methods Med. 2014;2014:276589. doi: 10.1155/2014/276589. Epub 2014 Jan 21.
5
CBFS: high performance feature selection algorithm based on feature clearness.CBFS:基于特征清晰性的高性能特征选择算法。
PLoS One. 2012;7(7):e40419. doi: 10.1371/journal.pone.0040419. Epub 2012 Jul 6.