Department of Computer Science, Wake Forest University, Winston-Salem, NC, USA.
School of Automation, Hangzhou Dianzi University, Hangzhou, Zhejiang, China.
Comput Biol Med. 2022 Dec;151(Pt A):106305. doi: 10.1016/j.compbiomed.2022.106305. Epub 2022 Nov 12.
The rapid development of scRNA-seq technology in recent years has enabled us to capture high-throughput gene expression profiles at single-cell resolution, reveal the heterogeneity of complex cell populations, and greatly advance our understanding of the underlying mechanisms in human diseases. Traditional methods for gene co-expression clustering are limited to discovering effective gene groups in scRNA-seq data. In this paper, we propose a novel gene clustering method based on convolutional neural networks called Dual-Stream Subspace Clustering Network (DS-SCNet). DS-SCNet can accurately identify important gene clusters from large scales of single-cell RNA-seq data and provide useful information for downstream analysis. Based on the simulated datasets, DS-SCNet successfully clusters genes into different groups and outperforms mainstream gene clustering methods, such as DBSCAN and DESC, across different evaluation metrics. To explore the biological insights of our proposed method, we applied it to real scRNA-seq data of patients with Alzheimer's disease (AD). DS-SCNet analyzed the single-cell RNA-seq data with 10,850 genes, and accurately identified 8 optimal clusters from 6673 cells. Enrichment analysis of these gene clusters revealed functional signaling pathways including the ILS signaling, the Rho GTPase signaling, and hemostasis pathways. Further analysis of gene regulatory networks identified new hub genes such as ELF4 as important regulators of AD, which indicates that DS-SCNet contributes to the discovery and understanding of the pathogenesis in Alzheimer's disease.
近年来,单细胞 RNA 测序(scRNA-seq)技术的快速发展使我们能够以单细胞分辨率捕获高通量基因表达谱,揭示复杂细胞群体的异质性,并极大地促进了我们对人类疾病潜在机制的理解。传统的基因共表达聚类方法仅限于在 scRNA-seq 数据中发现有效的基因组。在本文中,我们提出了一种基于卷积神经网络的新基因聚类方法,称为双流子空间聚类网络(DS-SCNet)。DS-SCNet 可以从大规模的单细胞 RNA-seq 数据中准确识别重要的基因簇,并为下游分析提供有用的信息。基于模拟数据集,DS-SCNet 成功地将基因聚类成不同的组,在不同的评估指标上均优于主流基因聚类方法,如 DBSCAN 和 DESC。为了探索我们提出的方法的生物学见解,我们将其应用于阿尔茨海默病(AD)患者的真实 scRNA-seq 数据。DS-SCNet 分析了 10850 个基因的单细胞 RNA-seq 数据,从 6673 个细胞中准确识别出 8 个最佳簇。这些基因簇的富集分析揭示了功能信号通路,包括 ILS 信号、Rho GTPase 信号和止血途径。对基因调控网络的进一步分析确定了新的枢纽基因,如 ELF4,作为 AD 的重要调节剂,这表明 DS-SCNet 有助于发现和理解阿尔茨海默病的发病机制。