支持向量机聚类

SVM clustering.

作者信息

Winters-Hilt Stephen, Merat Sam

机构信息

Department of Computer Science, University of New Orleans, LA 70148, USA.

出版信息

BMC Bioinformatics. 2007 Nov 1;8 Suppl 7(Suppl 7):S18. doi: 10.1186/1471-2105-8-S7-S18.

DOI:10.1186/1471-2105-8-S7-S18

PMID:18047717

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC2099486/

Abstract

BACKGROUND

Support Vector Machines (SVMs) provide a powerful method for classification (supervised learning). Use of SVMs for clustering (unsupervised learning) is now being considered in a number of different ways.

RESULTS

An SVM-based clustering algorithm is introduced that clusters data with no a priori knowledge of input classes. The algorithm initializes by first running a binary SVM classifier against a data set with each vector in the set randomly labelled, this is repeated until an initial convergence occurs. Once this initialization step is complete, the SVM confidence parameters for classification on each of the training instances can be accessed. The lowest confidence data (e.g., the worst of the mislabelled data) then has its' labels switched to the other class label. The SVM is then re-run on the data set (with partly re-labelled data) and is guaranteed to converge in this situation since it converged previously, and now it has fewer data points to carry with mislabelling penalties. This approach appears to limit exposure to the local minima traps that can occur with other approaches. Thus, the algorithm then improves on its weakly convergent result by SVM re-training after each re-labeling on the worst of the misclassified vectors - i.e., those feature vectors with confidence factor values beyond some threshold. The repetition of the above process improves the accuracy, here a measure of separability, until there are no misclassifications. Variations on this type of clustering approach are shown.

CONCLUSION

Non-parametric SVM-based clustering methods may allow for much improved performance over parametric approaches, particularly if they can be designed to inherit the strengths of their supervised SVM counterparts.

摘要

背景

支持向量机（SVM）为分类（监督学习）提供了一种强大的方法。目前，人们正以多种不同方式考虑将支持向量机用于聚类（无监督学习）。

结果

引入了一种基于支持向量机的聚类算法，该算法在对输入类别没有先验知识的情况下对数据进行聚类。该算法首先通过对一个数据集运行二元支持向量机分类器进行初始化，数据集中的每个向量都被随机标记，重复此操作直到出现初始收敛。一旦这个初始化步骤完成，就可以访问每个训练实例上用于分类的支持向量机置信度参数。然后，将置信度最低的数据（例如，标记错误最严重的数据）的标签切换为另一个类别标签。然后在数据集（带有部分重新标记的数据）上重新运行支持向量机，并且由于它之前已经收敛，并且现在需要承受错误标记惩罚的数据点更少，所以在这种情况下保证会收敛。这种方法似乎限制了陷入其他方法可能出现的局部极小值陷阱。因此，该算法在每次对分类错误最严重的向量（即置信因子值超过某个阈值的那些特征向量）重新标记后，通过支持向量机重新训练来改进其弱收敛结果。上述过程的重复提高了准确性，这里准确性是可分离性的一种度量，直到没有分类错误为止。展示了这种聚类方法的变体。

结论

基于非参数支持向量机的聚类方法可能比参数方法具有显著更高的性能，特别是如果它们能够被设计成继承其监督式支持向量机对应方法的优势。

Suppr 超能文献

文献检索

文件翻译

深度研究

Suppr 超能文献

文献检索

文件翻译

深度研究

支持向量机聚类

SVM clustering.

作者信息

机构信息

出版信息

BACKGROUND

RESULTS

CONCLUSION

背景

结果

结论

相似文献

引用本文的文献

本文引用的文献

相似文献

引用本文的文献

本文引用的文献

支持向量机聚类

SVM clustering.

作者信息

机构信息

出版信息

BACKGROUND

RESULTS

CONCLUSION

背景

结果

结论