Ressom Habtom, Wang Dali, Natarajan Padma
Intelligent Systems Laboratory, Department of Electrical and Computer Engineering, University of Maine, Orono, Maine 04469, USA.
Physiol Genomics. 2003 Jun 24;14(1):35-46. doi: 10.1152/physiolgenomics.00138.2002.
This paper presents a novel clustering technique known as adaptive double self-organizing map (ADSOM). ADSOM has a flexible topology and performs clustering and cluster visualization simultaneously, thereby requiring no a priori knowledge about the number of clusters. ADSOM is developed based on a recently introduced technique known as double self-organizing map (DSOM). DSOM combines features of the popular self-organizing map (SOM) with two-dimensional position vectors, which serve as a visualization tool to decide how many clusters are needed. Although DSOM addresses the problem of identifying unknown number of clusters, its free parameters are difficult to control to guarantee correct results and convergence. ADSOM updates its free parameters during training, and it allows convergence of its position vectors to a fairly consistent number of clusters provided that its initial number of nodes is greater than the expected number of clusters. The number of clusters can be identified by visually counting the clusters formed by the position vectors after training. A novel index is introduced based on hierarchical clustering of the final locations of position vectors. The index allows automated detection of the number of clusters, thereby reducing human error that could be incurred from counting clusters visually. The reliance of ADSOM in identifying the number of clusters is proven by applying it to publicly available gene expression data from multiple biological systems such as yeast, human, and mouse. ADSOM's performance in detecting number of clusters is compared with a model-based clustering method.
本文提出了一种名为自适应双自组织映射(ADSOM)的新型聚类技术。ADSOM具有灵活的拓扑结构,可同时执行聚类和聚类可视化,因此无需关于聚类数量的先验知识。ADSOM是基于最近引入的一种称为双自组织映射(DSOM)的技术开发的。DSOM将流行的自组织映射(SOM)的特征与二维位置向量相结合,二维位置向量用作一种可视化工具来确定需要多少个聚类。尽管DSOM解决了识别未知聚类数量的问题,但其自由参数难以控制以保证正确的结果和收敛。ADSOM在训练过程中更新其自由参数,并且只要其初始节点数量大于预期的聚类数量,它就能使位置向量收敛到相当一致数量的聚类。聚类数量可以通过在训练后直观地数出由位置向量形成的聚类来确定。基于位置向量最终位置的层次聚类引入了一个新的指标。该指标允许自动检测聚类数量,从而减少因直观计数聚类可能产生的人为误差。通过将ADSOM应用于来自酵母、人类和小鼠等多个生物系统的公开可用基因表达数据,证明了ADSOM在识别聚类数量方面的可靠性。将ADSOM在检测聚类数量方面的性能与基于模型的聚类方法进行了比较。