Trosset Michael W, Priebe Carey E, Park Youngser, Miller Michael I
Department of Statistics, Indiana University, Bloomington, IN 47405, USA.
Comput Stat Data Anal. 2008 Jun 15;52(10):4643-4657. doi: 10.1016/j.csda.2008.02.030.
The following two-stage approach to learning from dissimilarity data is described: (1) embed both labeled and unlabeled objects in a Euclidean space; then (2) train a classifier on the labeled objects. The use of linear discriminant analysis for (2), which naturally invites the use of classical multidimensional scaling for (1), is emphasized. The choice of the dimension of the Euclidean space in (1) is a model selection problem; too few or too many dimensions can degrade classifier performance. The question of how the inclusion of unlabeled objects in (1) affects classifier performance is investigated. In the case of spherical covariances, including unlabeled objects in (1) is demonstrably superior. Several examples are presented.
(1) 将有标签和无标签的对象嵌入欧几里得空间;然后 (2) 在有标签的对象上训练分类器。强调在 (2) 中使用线性判别分析,这自然会促使在 (1) 中使用经典多维缩放。(1) 中欧几里得空间维度的选择是一个模型选择问题;维度太少或太多都会降低分类器性能。研究了在 (1) 中包含无标签对象如何影响分类器性能的问题。在球形协方差的情况下,在 (1) 中包含无标签对象明显更优。给出了几个例子。