Big Data Research Center, School of Computer Science and Engineering, University of Electronic Science and Technology of China, Chengdu 611731, China.
Guangxi Key Laboratory of Hybrid Computation and IC Design Analysis, Guangxi University for Nationalities, Nanning 530006, China.
Sensors (Basel). 2020 May 8;20(9):2684. doi: 10.3390/s20092684.
Traffic sign recognition is a classification problem that poses challenges for computer vision and machine learning algorithms. Although both computer vision and machine learning techniques have constantly been improved to solve this problem, the sudden rise in the number of unlabeled traffic signs has become even more challenging. Large data collation and labeling are tedious and expensive tasks that demand much time, expert knowledge, and fiscal resources to satisfy the hunger of deep neural networks. Aside from that, the problem of having unbalanced data also poses a greater challenge to computer vision and machine learning algorithms to achieve better performance. These problems raise the need to develop algorithms that can fully exploit a large amount of unlabeled data, use a small amount of labeled samples, and be robust to data imbalance to build an efficient and high-quality classifier. In this work, we propose a novel semi-supervised classification technique that is robust to small and unbalanced data. The framework integrates weakly-supervised learning and self-training with self-paced learning to generate attention maps to augment the training set and utilizes a novel pseudo-label generation and selection algorithm to generate and select pseudo-labeled samples. The method improves the performance by: (1) normalizing the class-wise confidence levels to prevent the model from ignoring hard-to-learn samples, thereby solving the imbalanced data problem; (2) jointly learning a model and optimizing pseudo-labels generated on unlabeled data; and (3) enlarging the training set to satisfy the hunger of deep learning models. Extensive evaluations on two public traffic sign recognition datasets demonstrate the effectiveness of the proposed technique and provide a potential solution for practical applications.
交通标志识别是计算机视觉和机器学习算法面临的分类问题。尽管计算机视觉和机器学习技术一直在不断改进以解决这个问题,但突然出现的大量未标记的交通标志变得更加具有挑战性。大型数据整理和标记是繁琐且昂贵的任务,需要大量的时间、专业知识和财政资源来满足深度神经网络的需求。除此之外,数据不平衡的问题也对计算机视觉和机器学习算法提出了更大的挑战,以实现更好的性能。这些问题提出了开发算法的需求,这些算法可以充分利用大量的未标记数据,使用少量的标记样本,并对数据不平衡具有鲁棒性,以构建高效和高质量的分类器。在这项工作中,我们提出了一种新颖的半监督分类技术,该技术对小数据和不平衡数据具有鲁棒性。该框架将弱监督学习和自训练与自步学习集成在一起,生成注意力图来扩充训练集,并利用新颖的伪标签生成和选择算法生成和选择伪标签样本。该方法通过以下方式提高性能:(1)对类别置信度进行归一化,以防止模型忽略难以学习的样本,从而解决数据不平衡问题;(2)联合学习模型和优化未标记数据上生成的伪标签;(3)扩大训练集以满足深度学习模型的需求。在两个公共交通标志识别数据集上的广泛评估证明了所提出技术的有效性,并为实际应用提供了一种潜在的解决方案。