Lai Jie, Wang Xiaodan, Xiang Qian, Quan Wen, Song Yafei
College of Air and Missile Defense, Air Force Engineering University, Xi'an 710051, China.
College of Air Traffic Control and Navigation, Air Force Engineering University, Xi'an 710051, China.
Entropy (Basel). 2023 Aug 30;25(9):1274. doi: 10.3390/e25091274.
The efficiency and cognitive limitations of manual sample labeling result in a large number of unlabeled training samples in practical applications. Making full use of both labeled and unlabeled samples is the key to solving the semi-supervised problem. However, as a supervised algorithm, the stacked autoencoder (SAE) only considers labeled samples and is difficult to apply to semi-supervised problems. Thus, by introducing the pseudo-labeling method into the SAE, a novel pseudo label-based semi-supervised stacked autoencoder (PL-SSAE) is proposed to address the semi-supervised classification tasks. The PL-SSAE first utilizes the unsupervised pre-training on all samples by the autoencoder (AE) to initialize the network parameters. Then, by the iterative fine-tuning of the network parameters based on the labeled samples, the unlabeled samples are identified, and their pseudo labels are generated. Finally, the pseudo-labeled samples are used to construct the regularization term and fine-tune the network parameters to complete the training of the PL-SSAE. Different from the traditional SAE, the PL-SSAE requires all samples in pre-training and the unlabeled samples with pseudo labels in fine-tuning to fully exploit the feature and category information of the unlabeled samples. Empirical evaluations on various benchmark datasets show that the semi-supervised performance of the PL-SSAE is more competitive than that of the SAE, sparse stacked autoencoder (SSAE), semi-supervised stacked autoencoder (Semi-SAE) and semi-supervised stacked autoencoder (Semi-SSAE).
在实际应用中,人工样本标注的效率和认知局限性导致大量未标注的训练样本。充分利用已标注和未标注样本是解决半监督问题的关键。然而,作为一种监督算法,堆叠自编码器(SAE)仅考虑已标注样本,难以应用于半监督问题。因此,通过将伪标签方法引入SAE,提出了一种基于伪标签的新型半监督堆叠自编码器(PL-SSAE)来解决半监督分类任务。PL-SSAE首先利用自编码器(AE)对所有样本进行无监督预训练,以初始化网络参数。然后,基于已标注样本对网络参数进行迭代微调,识别未标注样本并生成其伪标签。最后,利用伪标注样本构建正则化项并微调网络参数,以完成PL-SSAE的训练。与传统SAE不同,PL-SSAE在预训练中需要所有样本,在微调中需要带有伪标签的未标注样本,以充分利用未标注样本的特征和类别信息。在各种基准数据集上的实证评估表明,PL-SSAE的半监督性能比SAE、稀疏堆叠自编码器(SSAE)、半监督堆叠自编码器(Semi-SAE)和半监督堆叠自编码器(Semi-SSAE)更具竞争力。