层次聚类和神经网络聚类的比较：精度优势分析。

Comparison of hierarchical clustering and neural network clustering: an analysis on precision dominance.

机构信息

Department of Mathematics, Forman Christian College (A Chartered University), Lahore, Pakistan.

出版信息

Sci Rep. 2023 Apr 6;13(1):5661. doi: 10.1038/s41598-023-32790-3.

DOI:10.1038/s41598-023-32790-3

PMID:37024621

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC10079863/

Abstract

A comparison of neural network clustering (NNC) and hierarchical clustering (HC) is conducted to assess computing dominance of two machine learning (ML) methods for classifying a populous data of large number of variables into clusters. An accurate clustering disposition is imperative to investigate assembly-influence of predictors on a system over a course of time. Moreover, categorically designated representation of variables can assist in scaling down a wide data without loss of essential system knowledge. For NNC, a self-organizing map (SOM)-training was used on a local aqua system to learn distribution and topology of variables in an input space. Ternary features of SOM; sample hits, neighbouring weight distances and weight planes were investigated to institute an optical inference of system's structural attributes. For HC, constitutional partitioning of the data was executed through a coupled dissimilarity-linkage matrix operation. The validation of this approach was established through a higher value of cophenetic coefficient. Additionally, an HC-feature of stem-division was used to determine cluster boundaries. SOM visuals reported two locations' samples for remarkable concentration analogy and presence of 4 extremely out of range concentration parameter from among 16 samples. NNC analysis also demonstrated that singular conduct of 18 independent components over a period of time can be comparably inquired through aggregate influence of 6 clusters containing these components. However, a precise number of 7 clusters was retrieved through HC analysis for segmentation of the system. Composing elements of each cluster were also distinctly provided. It is concluded that simultaneous categorization of system's predictors (water components) and inputs (locations) through NNC and HC is valid to the precision probability of 0.8, as compared to data segmentation conducted with either of the methods exclusively. It is also established that cluster genesis through combined HC's linkage and dissimilarity algorithms and NNC is more reliable than individual optical assessment of NNC, where varying a map size in SOM will alter the association of inputs' weights to neurons, providing a new consolidation of clusters.

摘要

对神经网络聚类 (NNC) 和层次聚类 (HC) 进行比较，以评估两种机器学习 (ML) 方法在将大量变量的数据分类为聚类时的计算优势。准确的聚类分配对于研究随着时间的推移预测因子对系统的组装影响至关重要。此外，对变量进行分类指定的表示形式可以帮助在不丢失系统知识的情况下缩小广泛的数据范围。对于 NNC，在本地水系统上使用自组织映射 (SOM) 训练来学习输入空间中变量的分布和拓扑结构。研究了 SOM 的三元特征；样本命中、相邻权重距离和权重平面，以对系统结构属性进行光学推断。对于 HC，通过耦合的不相似性-连接矩阵操作对数据进行组成分区。通过更高的吻合系数值来验证这种方法。此外，使用 HC 的茎部分割特征来确定聚类边界。SOM 可视化报告了两个位置的样本，它们具有显著的浓度相似性，并且在 16 个样本中存在 4 个非常超出范围的浓度参数。NNC 分析还表明，通过包含这些组件的 6 个聚类的总体影响，可以同时调查 18 个独立组件在一段时间内的单一行为。然而，通过 HC 分析检索到 7 个聚类的精确数量来对系统进行分割。还分别提供了每个聚类的组成元素。得出的结论是，通过 NNC 和 HC 同时对系统预测器（水成分）和输入（位置）进行分类，与仅使用两种方法之一进行数据分割相比，具有 0.8 的精度概率是有效的。还建立了通过组合 HC 的链接和不相似性算法和 NNC 的聚类起源比单独的 NNC 光学评估更可靠，其中在 SOM 中改变地图大小会改变输入权重与神经元的关联，提供了新的聚类整合。