School of Computer Science and Engineering, Northeastern University, Shenyang 110819, China.
Key Laboratory of Intelligent Computing in Medical Image (MIIC), Northeastern University, Ministry of Education, Shenyang 110000, China.
Comput Biol Chem. 2023 Oct;106:107924. doi: 10.1016/j.compbiolchem.2023.107924. Epub 2023 Jul 17.
Single-cell RNA sequencing (ScRNA-seq) technology reveals gene expression information at the cellular level. The critical tasks in ScRNA-seq data analysis are clustering and dimensionality reduction. Recent deep clustering algorithms are used to optimize the two tasks jointly, and their variations, graph-based deep clustering algorithms, are used to capture and preserve topological information in the process. However, the existing graph-based deep clustering algorithms ignore the distribution information of nodes when constructing cell graphs which leads to incomplete information in the embedding representation; and graph convolutional networks (GCN), which are most commonly used, often suffer from over-smoothing that leads to high sample similarity in the embedding representation and then poor clustering performance. Here, the dual-GCN-based deep clustering with Triplet contrast (scDGDC) is proposed for dimensionality reduction and clustering of scRNA-seq data. Two critical components are dual-GCN-based encoder for capturing more comprehensive topological information and triplet contrast for reducing GCN over-smoothing. The two components improve the dimensionality reduction and clustering performance of scDGDC in terms of information acquisition and model optimization, respectively. The experiments on eight real ScRNA-seq datasets showed that scDGDC achieves excellent performance for both clustering and dimensionality reduction tasks and is high robustness to parameters.
单细胞 RNA 测序(ScRNA-seq)技术揭示了细胞水平的基因表达信息。ScRNA-seq 数据分析的关键任务是聚类和降维。最近的深度聚类算法用于联合优化这两个任务,它们的变体——基于图的深度聚类算法——用于在过程中捕获和保留拓扑信息。然而,现有的基于图的深度聚类算法在构建细胞图时忽略了节点的分布信息,这导致嵌入表示中存在不完整的信息;而最常用的图卷积网络(GCN)经常受到过平滑的影响,这导致嵌入表示中的样本相似度高,从而聚类性能差。这里提出了基于双图卷积网络的深度聚类与三重对比(scDGDC)用于 ScRNA-seq 数据的降维和聚类。双图卷积网络的两个关键组件为编码器,用于捕获更全面的拓扑信息,三重对比用于减少 GCN 的过平滑。这两个组件分别通过信息获取和模型优化来提高 scDGDC 的降维和聚类性能。在八个真实的 ScRNA-seq 数据集上的实验表明,scDGDC 在聚类和降维任务上都取得了优异的性能,并且对参数具有高度的鲁棒性。