Yip Kevin Y, Cheung Lin, Cheung David W, Jing Liping, Ng Michael K
Department of Computer Science, Yale University, New Haven, Connecticut, USA.
Int J Data Min Bioinform. 2009;3(3):229-59. doi: 10.1504/ijdmb.2009.026700.
Recent studies have suggested that extremely low dimensional projected clusters exist in real datasets. Here, we propose a new algorithm for identifying them. It combines object clustering and dimension selection, and allows the input of domain knowledge in guiding the clustering process. Theoretical and experimental results show that even a small amount of input knowledge could already help detect clusters with only 1% of the relevant dimensions. We also show that this semi-supervised algorithm can perform knowledge-guided selective clustering when there are multiple meaningful object groupings. The algorithm is also shown effective in analysing a microarray dataset.
最近的研究表明,在真实数据集中存在极低维投影簇。在此,我们提出一种用于识别它们的新算法。该算法结合了对象聚类和维度选择,并允许输入领域知识来指导聚类过程。理论和实验结果表明,即使是少量的输入知识也已经能够帮助检测仅具有1%相关维度的簇。我们还表明,当存在多个有意义的对象分组时,这种半监督算法可以执行知识引导的选择性聚类。该算法在分析微阵列数据集时也被证明是有效的。