IEEE Trans Neural Netw Learn Syst. 2017 Jul;28(7):1716-1721. doi: 10.1109/TNNLS.2016.2546956. Epub 2016 Apr 6.
Obtaining a sufficient number of accurate labels to form a training set for learning a classifier can be difficult due to the limited access to reliable label resources. Instead, in real-world applications, less-accurate labels, such as labels from nonexpert labelers, are often used. However, learning with less-accurate labels can lead to serious performance deterioration because of the high noise rate. Although several learning methods (e.g., noise-tolerant classifiers) have been advanced to increase classification performance in the presence of label noise, only a few of them take the noise rate into account and utilize both noisy but easily accessible labels and less-noisy labels, a small amount of which can be obtained with an acceptable added time cost and expense. In this brief, we propose a learning method, in which not only noisy labels but also auxiliary less-noisy labels, which are available in a small portion of the training data, are taken into account. Based on a flipping probability noise model and a logistic regression classifier, this method estimates the noise rate parameters, infers ground-truth labels, and learns the classifier simultaneously in a maximum likelihood manner. The proposed method yields three learning algorithms, which correspond to three prior knowledge states regarding the less-noisy labels. The experiments show that the proposed method is tolerant to label noise, and outperforms classifiers that do not explicitly consider the auxiliary less-noisy labels.
由于可靠的标签资源有限,获取足够数量的准确标签来形成学习分类器的训练集可能很困难。相反,在实际应用中,通常使用不太准确的标签,例如非专业标签者的标签。然而,由于噪声率较高,使用不太准确的标签进行学习可能会导致严重的性能下降。尽管已经提出了几种学习方法(例如,抗噪声分类器)来提高存在标签噪声时的分类性能,但其中只有少数方法考虑了噪声率,并利用了嘈杂但易于访问的标签和少量较少噪声的标签,这些标签可以在可接受的附加时间成本和费用下获得。在本简讯中,我们提出了一种学习方法,该方法不仅考虑了嘈杂的标签,还考虑了在训练数据的一小部分中可用的辅助较少嘈杂的标签。基于翻转概率噪声模型和逻辑回归分类器,该方法以最大似然的方式同时估计噪声率参数、推断真实标签和学习分类器。所提出的方法产生了三种学习算法,它们对应于关于较少嘈杂标签的三种先验知识状态。实验表明,所提出的方法对标签噪声具有鲁棒性,并且优于不明确考虑辅助较少嘈杂标签的分类器。