IEEE/ACM Trans Comput Biol Bioinform. 2023 Nov-Dec;20(6):3535-3546. doi: 10.1109/TCBB.2023.3298334. Epub 2023 Dec 25.
Advances in single-cell biotechnologies have generated the single-cell RNA sequencing (scRNA-seq) of gene expression profiles at cell levels, providing an opportunity to study cellular distribution. Although significant efforts developed in their analysis, many problems remain in studying cell types distribution because of the heterogeneity, high dimensionality, and noise of scRNA-seq. In this study, a multi-view clustering with graph learning algorithm (MCGL) for scRNA-seq data is proposed, which consists of multi-view learning, graph learning, and cell type clustering. In order to avoid a single feature space of scRNA-seq being inadequate to comprehensively characterize the functions of cells, MCGL constructs the multiple feature spaces and utilizes multi-view learning to comprehensively characterize scRNA-seq data from different perspectives. MCGL adaptively learns the similarity graphs of cells that overcome the dependence on fixed similarity, transforming scRNA-seq analysis into the analysis of multi-view clustering. MCGL decomposes the networks of cells into view-specific and common networks in multi-view learning, which better characterizes the topological relationship of cells. MCGL simultaneously utilizes multiple types of cell-cell networks and fully exploits the connection relationship between cells through the complementarity between networks to improve clustering performance. The graph learning, graph factorization, and cell-type clustering processes are accomplished simultaneously under one optimization framework. The performance of the MCGL algorithm is validated with ten scRNA-seq datasets from different scales, and experimental results imply that the proposed algorithm significantly outperforms fourteen state-of-the-art scRNA-seq algorithms.
单细胞生物技术的进步已经产生了单细胞 RNA 测序 (scRNA-seq) 的基因表达谱,为研究细胞分布提供了机会。尽管在分析方面做出了重大努力,但由于 scRNA-seq 的异质性、高维性和噪声,研究细胞类型分布仍然存在许多问题。在这项研究中,提出了一种用于 scRNA-seq 数据的多视图聚类与图学习算法 (MCGL),它由多视图学习、图学习和细胞类型聚类组成。为了避免 scRNA-seq 的单一特征空间不足以全面描述细胞的功能,MCGL 构建了多个特征空间,并利用多视图学习从不同角度全面描述 scRNA-seq 数据。MCGL 自适应地学习细胞的相似性图,克服了对固定相似性的依赖,将 scRNA-seq 分析转化为多视图聚类分析。MCGL 在多视图学习中将细胞的网络分解为视图特定的和共同的网络,更好地描述了细胞的拓扑关系。MCGL 同时利用多种类型的细胞-细胞网络,并通过网络之间的互补性充分利用细胞之间的连接关系,以提高聚类性能。图学习、图分解和细胞类型聚类过程在一个优化框架下同时完成。通过来自不同规模的十个 scRNA-seq 数据集验证了 MCGL 算法的性能,实验结果表明,所提出的算法明显优于十四种最先进的 scRNA-seq 算法。