Sigdel Madhav, Dinç İmren, Dinç Semih, Sigdel Madhu S, Pusey Marc L, Aygün Ramazan S
DataMedia Research Lab, Department of Computer Science, University of Alabama in Huntsville, Huntsville, Alabama 35899, United States.
iXpressGenes, Inc., 601 Genome Way, Huntsville, Alabama 35806, United States.
Proc IEEE Southeastcon. 2014 Mar;2014. doi: 10.1109/SECON.2014.6950649.
In this paper, we investigate the performance of two wrapper methods for semi-supervised learning algorithms for classification of protein crystallization images with limited labeled images. Firstly, we evaluate the performance of semi-supervised approach using self-training with naïve Bayesian (NB) and sequential minimum optimization (SMO) as the base classifiers. The confidence values returned by these classifiers are used to select high confident predictions to be used for self-training. Secondly, we analyze the performance of Yet Another Two Stage Idea (YATSI) semi-supervised learning using NB, SMO, multilayer perceptron (MLP), J48 and random forest (RF) classifiers. These results are compared with the basic supervised learning using the same training sets. We perform our experiments on a dataset consisting of 2250 protein crystallization images for different proportions of training and test data. Our results indicate that NB and SMO using both self-training and YATSI semi-supervised approaches improve accuracies with respect to supervised learning. On the other hand, MLP, J48 and RF perform better using basic supervised learning. Overall, random forest classifier yields the best accuracy with supervised learning for our dataset.
在本文中,我们研究了两种包装器方法在具有有限标记图像的蛋白质结晶图像分类半监督学习算法中的性能。首先,我们使用朴素贝叶斯(NB)和序列最小优化(SMO)作为基础分类器,通过自训练来评估半监督方法的性能。这些分类器返回的置信度值用于选择高置信度预测,以用于自训练。其次,我们分析了使用NB、SMO、多层感知器(MLP)、J48和随机森林(RF)分类器的另一种两阶段思想(YATSI)半监督学习的性能。将这些结果与使用相同训练集的基本监督学习进行比较。我们在一个由2250张蛋白质结晶图像组成的数据集上进行实验,该数据集用于不同比例的训练和测试数据。我们的结果表明,使用自训练和YATSI半监督方法的NB和SMO相对于监督学习提高了准确率。另一方面,MLP、J48和RF在基本监督学习下表现更好。总体而言,对于我们的数据集,随机森林分类器在监督学习中产生了最佳准确率。