University of South Pacific, Suva, Fiji; University of Picardie Jules Verne, Amiens, France.
Università degli Studi di Siena, DIISM, Siena, Italy.
Neural Netw. 2020 Jan;121:57-73. doi: 10.1016/j.neunet.2019.07.018. Epub 2019 Jul 31.
Hierarchical clustering is an important tool for extracting information from data in a multi-resolution way. It is more meaningful if driven by data, as in the case of divisive algorithms, which split data until no more division is allowed. However, they have the drawback of the splitting threshold setting. The neural networks can address this problem, because they basically depend on data. The growing hierarchical GH-EXIN neural network builds a hierarchical tree in an incremental (data-driven architecture) and self-organized way. It is a top-down technique which defines the horizontal growth by means of an anisotropic region of influence, based on the novel idea of neighborhood convex hull. It also reallocates data and detects outliers by using a novel approach on all the leaves, simultaneously. Its complexity is estimated and an analysis of its user-dependent parameters is given. The advantages of the proposed approach, with regard to the best existing networks, are shown and analyzed, qualitatively and quantitatively, both in benchmark synthetic problems and in a real application (image recognition from video), in order to test the performance in building hierarchical trees. Furthermore, an important and very promising application of GH-EXIN in two-way hierarchical clustering, for the analysis of gene expression data in the study of the colorectal cancer is described.
层次聚类是一种从数据中以多分辨率方式提取信息的重要工具。如果由数据驱动,它更有意义,例如分裂算法,它将数据分割,直到不再允许分割。然而,它们有分割阈值设置的缺点。神经网络可以解决这个问题,因为它们基本上依赖于数据。生长的层次 GH-EXIN 神经网络以增量(数据驱动的架构)和自组织的方式构建层次树。它是一种自顶向下的技术,通过基于邻域凸包的新颖思想,定义水平生长的各向异性影响区域。它还通过在所有叶子上使用新颖的方法重新分配数据并检测异常值。估计了其复杂度,并对其用户依赖参数进行了分析。在基准合成问题和实际应用(从视频识别图像)中,对所提出的方法相对于最佳现有网络的优势进行了定性和定量的分析,以测试构建层次树的性能。此外,描述了 GH-EXIN 在双向层次聚类中的一个重要且非常有前途的应用,用于分析结直肠癌研究中的基因表达数据。