Fraunhofer Center for Machine Learning, Kaiserslautern, Germany.
Fraunhofer Institute for Industrial Mathematics ITWM, Kaiserslautern, Germany.
PLoS One. 2023 Jan 17;18(1):e0279876. doi: 10.1371/journal.pone.0279876. eCollection 2023.
We propose a novel methodology for general multi-class classification in arbitrary feature spaces, which results in a potentially well-calibrated classifier. Calibrated classifiers are important in many applications because, in addition to the prediction of mere class labels, they also yield a confidence level for each of their predictions. In essence, the training of our classifier proceeds in two steps. In a first step, the training data is represented in a latent space whose geometry is induced by a regular (n - 1)-dimensional simplex, n being the number of classes. We design this representation in such a way that it well reflects the feature space distances of the datapoints to their own- and foreign-class neighbors. In a second step, the latent space representation of the training data is extended to the whole feature space by fitting a regression model to the transformed data. With this latent-space representation, our calibrated classifier is readily defined. We rigorously establish its core theoretical properties and benchmark its prediction and calibration properties by means of various synthetic and real-world data sets from different application domains.
我们提出了一种新颖的方法,用于任意特征空间中的通用多类分类,从而得到一个潜在校准良好的分类器。校准分类器在许多应用中很重要,因为除了预测简单的类别标签外,它们还为每个预测提供置信度水平。本质上,我们的分类器训练分为两个步骤。在第一步中,训练数据以潜在空间表示,其几何形状由规则的(n-1)维单形诱导,n 是类别的数量。我们以一种能够很好地反映数据点到自身和异类邻居的特征空间距离的方式设计这种表示。在第二步中,通过对变换后的数据拟合回归模型,将训练数据的潜在空间表示扩展到整个特征空间。使用这种潜在空间表示,我们可以轻松定义校准分类器。我们通过来自不同应用领域的各种合成和真实数据集,严格建立其核心理论性质,并对其预测和校准性质进行基准测试。