Fernandez Elmer A, Balzarini Monica
Faculty of Engineering, Catholic University of Córdoba, Córdoba, Camino Alta Gracia Km 10, Cordoba, Argentina.
Comput Biol Med. 2007 Dec;37(12):1677-89. doi: 10.1016/j.compbiomed.2007.04.003. Epub 2007 Jun 4.
Cluster analysis is one of the crucial steps in gene expression pattern (GEP) analysis. It leads to the discovery or identification of temporal patterns and coexpressed genes. GEP analysis involves highly dimensional multivariate data which demand appropriate tools. A good alternative for grouping many multidimensional objects is self-organizing maps (SOM), an unsupervised neural network algorithm able to find relationships among data. SOM groups and maps them topologically. However, it may be difficult to identify clusters with the usual visualization tools for SOM. We propose a simple algorithm to identify and visualize clusters in SOM (the RP-Q method). The RP is a new node-adaptive attribute that moves in a two dimensional virtual space imitating the movement of the codebooks vectors of the SOM net into the input space. The Q statistic evaluates the SOM structure providing an estimation of the number of clusters underlying the data set. The SOM-RP-Q algorithm permits the visualization of clusters in the SOM and their node patterns. The algorithm was evaluated in several simulated and real GEP data sets. Results show that the proposed algorithm successfully displays the underlying cluster structure directly from the SOM and is robust to different net sizes.
聚类分析是基因表达模式(GEP)分析中的关键步骤之一。它有助于发现或识别时间模式和共表达基因。GEP分析涉及高维多元数据,需要合适的工具。对许多多维对象进行分组的一个不错选择是自组织映射(SOM),这是一种无监督神经网络算法,能够发现数据之间的关系。SOM对数据进行分组并进行拓扑映射。然而,使用SOM常用的可视化工具可能难以识别聚类。我们提出了一种简单的算法来识别和可视化SOM中的聚类(RP-Q方法)。RP是一种新的节点自适应属性,它在二维虚拟空间中移动,模仿SOM网络的码本向量在输入空间中的移动。Q统计量评估SOM结构,提供对数据集潜在聚类数量的估计。SOM-RP-Q算法允许可视化SOM中的聚类及其节点模式。该算法在几个模拟和真实的GEP数据集中进行了评估。结果表明,所提出的算法能够直接从SOM成功显示潜在的聚类结构,并且对不同的网络大小具有鲁棒性。