IEEE Trans Syst Man Cybern B Cybern. 2011 Dec;41(6):1612-26. doi: 10.1109/TSMCB.2011.2157998. Epub 2011 Jun 23.
Co-training is one of the major semi-supervised learning paradigms that iteratively trains two classifiers on two different views, and uses the predictions of either classifier on the unlabeled examples to augment the training set of the other. During the co-training process, especially in initial rounds when the classifiers have only mediocre accuracy, it is quite possible that one classifier will receive labels on unlabeled examples erroneously predicted by the other classifier. Therefore, the performance of co-training style algorithms is usually unstable. In this paper, the problem of how to reliably communicate labeling information between different views is addressed by a novel co-training algorithm named COTRADE. In each labeling round, COTRADE carries out the label communication process in two steps. First, confidence of either classifier's predictions on unlabeled examples is explicitly estimated based on specific data editing techniques. Secondly, a number of predicted labels with higher confidence of either classifier are passed to the other one, where certain constraints are imposed to avoid introducing undesirable classification noise. Experiments on several real-world datasets across three domains show that COTRADE can effectively exploit unlabeled data to achieve better generalization performance.
协同训练是主要的半监督学习范式之一,它在两种不同视图上迭代训练两个分类器,并使用任一分类器对未标记示例的预测来扩充另一个分类器的训练集。在协同训练过程中,尤其是在初始轮次中,当分类器的准确率仅处于中等水平时,很有可能一个分类器会在另一个分类器错误预测的未标记示例上获得标签。因此,协同训练风格算法的性能通常不稳定。在本文中,一种名为COTRADE的新型协同训练算法解决了如何在不同视图之间可靠地传递标记信息的问题。在每个标记轮次中,COTRADE分两步进行标签通信过程。首先,基于特定的数据编辑技术明确估计任一分类器对未标记示例预测的置信度。其次,将任一分类器中具有较高置信度的一些预测标签传递给另一个分类器,并在那里施加某些约束以避免引入不良分类噪声。在三个领域的多个真实世界数据集上进行的实验表明,COTRADE可以有效地利用未标记数据来实现更好的泛化性能。