Birant Kokten Ulas
Department of Computer Engineering, Dokuz Eylul University, Izmir 35390, Turkey.
Entropy (Basel). 2023 Jan 11;25(1):149. doi: 10.3390/e25010149.
As one of the entropy-based methods, the k-Star algorithm benefits from information theory in computing the distances between data instances during the classification task. k-Star is a machine learning method with a high classification performance and strong generalization ability. Nevertheless, as a standard supervised learning method, it performs learning only from labeled data. This paper proposes an improved method, called (SSS), which makes efficient predictions by considering unlabeled data in addition to labeled data. Moreover, it introduces a novel semi-supervised learning approach, called , against self-training. It has the advantage of enabling a powerful and robust model of data by combining multiple classifiers and using an entropy measure. The results of extensive experimental studies showed that the proposed holo-training approach outperformed the self-training approach on 13 out of the 18 datasets. Furthermore, the proposed SSS method achieved higher accuracy (95.25%) than the state-of-the-art semi-supervised methods (90.01%) on average. The significance of the experimental results was validated by using both the Binomial Sign test and the Friedman test.
作为基于熵的方法之一,k-Star算法在分类任务中计算数据实例之间的距离时受益于信息论。k-Star是一种具有高分类性能和强泛化能力的机器学习方法。然而,作为一种标准的监督学习方法,它仅从标记数据进行学习。本文提出了一种改进方法,称为(SSS),该方法除了考虑标记数据外还通过考虑未标记数据来进行有效预测。此外,它引入了一种针对自我训练的新颖半监督学习方法,称为。它具有通过组合多个分类器并使用熵度量来构建强大且稳健的数据模型的优势。广泛实验研究的结果表明,所提出的全训练方法在18个数据集中的13个上优于自我训练方法。此外,所提出的SSS方法平均比现有最先进的半监督方法(90.01%)获得更高的准确率(95.25%)。通过使用二项式符号检验和弗里德曼检验验证了实验结果的显著性。