Sierra A, Corbacho F
Escuela Técnica Superior de Informática, Universidad Autónoma de Madrid, Spain.
Neural Comput. 2000 Nov;12(11):2537-46. doi: 10.1162/089976600300014836.
In some branches of science, such as molecular biology, classes may be defined but not completely trusted. Sometimes posterior analysis proves them to be partially incorrect. Despite its relevance, this phenomenon has not received much attention within the neural computation community. We define reclassification as the task of redefining some given classes by maximum likelihood learning in a model that contains both supervised and unsupervised information. This approach leads to supervised clustering with an additional complexity penalizing term on the number of new classes. As a proof of concept, a simple reclassification algorithm is designed and applied to a data set of gene sequences. To test the performance of the algorithm, two of the original classes are merged. The algorithm is capable of unraveling the original three-class hidden structure, in contrast to the unsupervised version (K-means); moreover, it predicts the subdivision of one of the original classes into two different ones.
在一些科学分支中,如分子生物学,类别可以被定义,但不能完全被信赖。有时事后分析证明它们部分是不正确的。尽管这种现象具有相关性,但在神经计算领域却没有得到太多关注。我们将重新分类定义为在一个包含监督和无监督信息的模型中,通过最大似然学习重新定义一些给定类别的任务。这种方法导致了有监督的聚类,并且对新类别的数量有一个额外的复杂度惩罚项。作为概念验证,设计了一种简单的重新分类算法并将其应用于基因序列数据集。为了测试该算法的性能,将两个原始类别合并。与无监督版本(K均值)相比,该算法能够揭示原始的三类隐藏结构;此外,它还能预测将原始类别之一细分为两个不同的类别。