Castano Alberto, Gonzalez Pablo, Gonzalez Jaime Alonso, Del Coz Juan Jose
IEEE Trans Neural Netw Learn Syst. 2022 Jun 8;PP. doi: 10.1109/TNNLS.2022.3179355.
The goal of quantification learning is to induce models capable of accurately predicting the class distribution for new bags of unseen examples. These models only return the prevalence of each class in the bag because prediction of individual examples is irrelevant in these tasks. A prototypical application of ordinal quantification is to predict the proportion of opinions that fall into each category from one to five stars. Ordinal quantification has hardly been studied in the literature, and in fact, only one approach has been proposed so far. This article presents a comprehensive study of ordinal quantification, analyzing the applicability of the most important algorithms devised for multiclass quantification and proposing three new methods that are based on matching distributions using Earth mover's distance (EMD). Empirical experiments compare 14 algorithms on synthetic and benchmark data. To statistically analyze the obtained results, we further introduce an EMD-based scoring function. The main conclusion is that methods using a criterion somehow related to EMD, including two of our proposals, obtain significantly better results.
量化学习的目标是诱导出能够准确预测新的未见示例包的类别分布的模型。这些模型仅返回包中每个类别的流行率,因为在这些任务中对单个示例的预测并不相关。序数量化的一个典型应用是预测从一到五星的每个类别中意见的比例。序数量化在文献中几乎没有得到研究,事实上,到目前为止只提出了一种方法。本文对序数量化进行了全面研究,分析了为多类量化设计的最重要算法的适用性,并提出了三种基于使用推土机距离(EMD)匹配分布的新方法。实证实验在合成数据和基准数据上比较了14种算法。为了对获得的结果进行统计分析,我们进一步引入了一个基于EMD的评分函数。主要结论是,使用某种与EMD相关标准的方法,包括我们提出的两种方法,取得了明显更好的结果。