D3K：用于单细胞RNA测序数据的差异-密度-动态半径K均值聚类算法

D3K: The Dissimilarity-Density-Dynamic Radius K-means Clustering Algorithm for scRNA-Seq Data.

作者信息

Liu Guoyun, Li Manzhi, Wang Hongtao, Lin Shijun, Xu Junlin, Li Ruixi, Tang Min, Li Chun

机构信息

School of Mathematics and Statistics, Hainan Normal University, Haikou, China.

Key Laboratory of Data Science and Smart Education, Ministry of Education, Hainan Normal University, Haikou, China.

出版信息

Front Genet. 2022 Jul 1;13:912711. doi: 10.3389/fgene.2022.912711. eCollection 2022.

DOI:10.3389/fgene.2022.912711

PMID:35846121

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC9284269/

Abstract

A single-cell sequencing data set has always been a challenge for clustering because of its high dimension and multi-noise points. The traditional K-means algorithm is not suitable for this type of data. Therefore, this study proposes a Dissimilarity-Density-Dynamic Radius-K-means clustering algorithm. The algorithm adds the dynamic radius parameter to the calculation. It flexibly adjusts the active radius according to the data characteristics, which can eliminate the influence of noise points and optimize the clustering results. At the same time, the algorithm calculates the weight through the dissimilarity density of the data set, the average contrast of candidate clusters, and the dissimilarity of candidate clusters. It obtains a set of high-quality initial center points, which solves the randomness of the K-means algorithm in selecting the center points. Finally, compared with similar algorithms, this algorithm shows a better clustering effect on single-cell data. Each clustering index is higher than other single-cell clustering algorithms, which overcomes the shortcomings of the traditional K-means algorithm.

摘要

由于单细胞测序数据集具有高维度和多噪声点的特点，对其进行聚类一直是一项挑战。传统的K均值算法不适用于这类数据。因此，本研究提出了一种差异-密度-动态半径-K均值聚类算法。该算法在计算中添加了动态半径参数。它根据数据特征灵活调整活动半径，能够消除噪声点的影响并优化聚类结果。同时，该算法通过数据集的差异密度、候选聚类的平均对比度和候选聚类的差异来计算权重。它获得了一组高质量的初始中心点，解决了K均值算法在选择中心点时的随机性问题。最后，与类似算法相比，该算法在单细胞数据上表现出更好的聚类效果。每个聚类指标都高于其他单细胞聚类算法，克服了传统K均值算法的缺点。