Department of Nanobiomedical Science, Dankook University, Cheonan 330-714, Republic of Korea.
Comput Biol Med. 2011 Feb;41(2):115-22. doi: 10.1016/j.compbiomed.2010.12.006. Epub 2011 Jan 8.
The quality of dataset has a profound effect on classification accuracy, and there is a clear need for some method to evaluate this quality. In this paper, we propose a new dataset evaluation method using the R-value measure. This proposed method is based on the ratio of overlapping areas among categories in a dataset. A high R-value for a dataset indicates that the dataset contains wide overlapping areas among its categories, and classification accuracy on the dataset may become low. We can use the R-value measure to understand the characteristics of a dataset, the feature selection process, and the proper design of new classifiers.
数据集的质量对分类精度有深远的影响,因此显然需要某种方法来评估其质量。在本文中,我们提出了一种使用 R 值度量的新数据集评估方法。该方法基于数据集的类别之间重叠区域的比例。数据集的 R 值较高表示其类别之间存在广泛的重叠区域,因此在该数据集上的分类精度可能会较低。我们可以使用 R 值度量来了解数据集的特征、特征选择过程以及新分类器的合理设计。