Suppr超能文献

基于互邻居的聚类方法及其医学应用。

A mutual neighbor-based clustering method and its medical applications.

机构信息

Zhejiang Industry Polytechnic College, Shaoxing 312000, PR China.

Zhejiang Normal University, Jinhua 321000, PR China.

出版信息

Comput Biol Med. 2022 Nov;150:106184. doi: 10.1016/j.compbiomed.2022.106184. Epub 2022 Oct 12.

Abstract

Clustering analysis has been widely used in various real-world applications. Due to the simplicity of K-means, it has become the most popular clustering analysis technique in reality. Unfortunately, the performance of K-means heavily relies on initial centers, which should be specified in prior. Besides, it cannot effectively identify manifold clusters. In this paper, we propose a novel clustering algorithm based on representative data objects derived from mutual neighbors to identify different shaped clusters. Specifically, it first obtains mutual neighbors to estimate the density for each data object, and then identifies representative objects with high densities to represent the whole data. Moreover, a concept of path distance, deriving from a minimum spanning tree, is introduced to measure the similarities of representative objects for manifold structures. Finally, an improved K-means with initial centers and path-based distances is proposed to group the representative objects into clusters. For non-representative objects, their cluster labels are determined by neighborhood information. To verify the effectiveness of the proposed method, we conducted comparison experiments on synthetic data and further applied it to medical scenarios. The results show that our clustering method can effectively identify arbitrary-shaped clusters and disease types in comparing to the state-of-the-art clustering ones.

摘要

聚类分析在各种实际应用中得到了广泛的应用。由于 K-均值算法简单,它已成为现实中最流行的聚类分析技术。不幸的是,K-均值的性能严重依赖于初始中心,而初始中心需要事先指定。此外,它不能有效地识别复杂形状的聚类。在本文中,我们提出了一种新的聚类算法,该算法基于互邻居中派生的代表性数据对象来识别不同形状的聚类。具体来说,它首先获取互邻居来估计每个数据对象的密度,然后识别具有高密度的代表性对象来表示整个数据。此外,引入了源自最小生成树的路径距离的概念来度量复杂结构的代表性对象之间的相似性。最后,提出了一种带有初始中心和基于路径的距离的改进 K-均值算法,用于将代表性对象分组为聚类。对于非代表性对象,其聚类标签由邻域信息确定。为了验证所提出方法的有效性,我们在合成数据上进行了对比实验,并进一步将其应用于医学场景。结果表明,与最先进的聚类方法相比,我们的聚类方法可以有效地识别任意形状的聚类和疾病类型。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验