Mavroudi Seferina, Papadimitriou Stergios, Bezerianos Anastasios
Department of Medical Physics, School of Medicine, University of Patras, 26500 Patras, Greece.
Bioinformatics. 2002 Nov;18(11):1446-53. doi: 10.1093/bioinformatics/18.11.1446.
Currently the most popular approach to analyze genome-wide expression data is clustering. One of the major drawbacks of most of the existing clustering methods is that the number of clusters has to be specified a priori. Furthermore, by using pure unsupervised algorithms prior biological knowledge is totally ignored Moreover, most current tools lack an effective framework for tight integration of unsupervised and supervised learning for the analysis of high-dimensional expression data and only very few multi-class supervised approaches are designed with the provision for effectively utilizing multiple functional class labeling.
The paper adapts a novel Self-Organizing map called supervised Network Self-Organized Map (sNet-SOM) to the peculiarities of multi-labeled gene expression data. The sNet-SOM determines adaptively the number of clusters with a dynamic extension process. This process is driven by an inhomogeneous measure that tries to balance unsupervised, supervised and model complexity criteria. Nodes within a rectangular grid are grown at the boundary nodes, weights rippled from the internal nodes towards the outer nodes of the grid, and whole columns inserted within the map The appropriate level of expansion is determined automatically. Multiple sNet-SOM models are constructed dynamically each for a different unsupervised/supervised balance and model selection criteria are used to select the one optimum one. The results indicate that sNet-SOM yields competitive performance to other recently proposed approaches for supervised classification at a significantly reduced computational cost and it provides extensive exploratory analysis potentiality within the analysis framework. Furthermore, it explores simple design decisions that are easier to comprehend and computationally efficient.
目前,分析全基因组表达数据最流行的方法是聚类。大多数现有聚类方法的主要缺点之一是聚类数量必须事先指定。此外,通过使用纯无监督算法,先前的生物学知识被完全忽略。而且,目前大多数工具缺乏一个有效的框架来紧密集成无监督和有监督学习以分析高维表达数据,并且只有极少数多类有监督方法在设计时考虑了有效利用多个功能类标签。
本文将一种名为监督网络自组织映射(sNet - SOM)的新型自组织映射方法应用于多标签基因表达数据的特性分析。sNet - SOM通过动态扩展过程自适应地确定聚类数量。这个过程由一种不均匀度量驱动,该度量试图平衡无监督、有监督和模型复杂度标准。矩形网格内的节点在边界节点处生长,权重从内部节点向网格的外部节点波动,并且在映射图中插入整列。自动确定适当的扩展级别。针对不同的无监督/有监督平衡动态构建多个sNet - SOM模型,并使用模型选择标准来选择最优的一个。结果表明,sNet - SOM在显著降低计算成本的情况下,与其他最近提出的有监督分类方法相比具有竞争力的性能,并且它在分析框架内提供了广泛的探索性分析潜力。此外,它探索了更易于理解和计算高效的简单设计决策。