Department of Biochemistry, Molecular Biology, and Biophysics, University of Minnesota, 420 Washington Ave. S.E., Minneapolis, 55455, Minnesota, USA.
Department of Computer Science and Engineering, Santa Clara University, 500 El Camino Real, Santa Clara, 95053, California, USA.
Brief Bioinform. 2023 May 19;24(3). doi: 10.1093/bib/bbad157.
With the aim of analyzing large-sized multidimensional single-cell datasets, we are describing a method for Cosine-based Tanimoto similarity-refined graph for community detection using Leiden's algorithm (CosTaL). As a graph-based clustering method, CosTaL transforms the cells with high-dimensional features into a weighted k-nearest-neighbor (kNN) graph. The cells are represented by the vertices of the graph, while an edge between two vertices in the graph represents the close relatedness between the two cells. Specifically, CosTaL builds an exact kNN graph using cosine similarity and uses the Tanimoto coefficient as the refining strategy to re-weight the edges in order to improve the effectiveness of clustering. We demonstrate that CosTaL generally achieves equivalent or higher effectiveness scores on seven benchmark cytometry datasets and six single-cell RNA-sequencing datasets using six different evaluation metrics, compared with other state-of-the-art graph-based clustering methods, including PhenoGraph, Scanpy and PARC. As indicated by the combined evaluation metrics, Costal has high efficiency with small datasets and acceptable scalability for large datasets, which is beneficial for large-scale analysis.
为了分析大型多维单细胞数据集,我们描述了一种基于余弦相似度的莱顿算法(CosTaL)用于社区检测的 Tanimoto 相似性细化图的方法。作为一种基于图的聚类方法,CosTaL 将具有高维特征的细胞转换为加权 k 最近邻(kNN)图。细胞由图的顶点表示,而图中两个顶点之间的边表示两个细胞之间的密切关系。具体来说,CosTaL 使用余弦相似度构建精确的 kNN 图,并使用 Tanimoto 系数作为细化策略来重新加权边,以提高聚类的有效性。我们使用六种不同的评估指标,在七个基准细胞测定数据集和六个单细胞 RNA 测序数据集上展示了 CosTaL 与其他最先进的基于图的聚类方法(包括 PhenoGraph、Scanpy 和 PARC)相比,通常可以获得等效或更高的有效性得分。通过综合评估指标表明,Costal 具有高效性和小数据集的可扩展性,以及对于大型数据集的可接受的可扩展性,这有利于大规模分析。