Suppr超能文献

学习 scRNA-seq 数据聚类的细胞深度特征和拓扑结构。

Learning deep features and topological structure of cells for clustering of scRNA-sequencing data.

机构信息

School of Computer Science and Technology, Xidian University, Xi'an, 710071, China.

出版信息

Brief Bioinform. 2022 May 13;23(3). doi: 10.1093/bib/bbac068.

Abstract

Single-cell RNA sequencing (scRNA-seq) measures gene transcriptome at the cell level, paving the way for the identification of cell subpopulations. Although deep learning has been successfully applied to scRNA-seq data, these algorithms are criticized for the undesirable performance and interpretability of patterns because of the noises, high-dimensionality and extraordinary sparsity of scRNA-seq data. To address these issues, a novel deep learning subspace clustering algorithm (aka scGDC) for cell types in scRNA-seq data is proposed, which simultaneously learns the deep features and topological structure of cells. Specifically, scGDC extends auto-encoder by introducing a self-representation layer to extract deep features of cells, and learns affinity graph of cells, which provide a better and more comprehensive strategy to characterize structure of cell types. To address heterogeneity of scRNA-seq data, scGDC projects cells of various types onto different subspaces, where types, particularly rare cell types, are well discriminated by utilizing generative adversarial learning. Furthermore, scGDC joins deep feature extraction, structural learning and cell type discovery, where features of cells are extracted under the guidance of cell types, thereby improving performance of algorithms. A total of 15 scRNA-seq datasets from various tissues and organisms with the number of cells ranging from 56 to 63 103 are adopted to validate performance of algorithms, and experimental results demonstrate that scGDC significantly outperforms 14 state-of-the-art methods in terms of various measurements (on average 25.51% by improvement), where (rare) cell types are significantly associated with topology of affinity graph of cells. The proposed model and algorithm provide an effective strategy for the analysis of scRNA-seq data (The software is coded using python, and is freely available for academic https://github.com/xkmaxidian/scGDC).

摘要

单细胞 RNA 测序(scRNA-seq)在细胞水平上测量基因转录组,为鉴定细胞亚群铺平了道路。尽管深度学习已成功应用于 scRNA-seq 数据,但由于 scRNA-seq 数据的噪声、高维性和非凡的稀疏性,这些算法的模式性能和可解释性受到批评。为了解决这些问题,提出了一种用于 scRNA-seq 数据中细胞类型的新型深度学习子空间聚类算法(又名 scGDC),它同时学习细胞的深度学习特征和拓扑结构。具体来说,scGDC 通过引入自表示层来扩展自动编码器,以提取细胞的深度学习特征,并学习细胞的亲和图,这为更好地和更全面地描述细胞类型的结构提供了策略。为了解决 scRNA-seq 数据的异质性,scGDC 将不同类型的细胞投影到不同的子空间上,在这些子空间中,利用生成对抗学习可以很好地区分类型,特别是罕见的细胞类型。此外,scGDC 加入了深度学习特征提取、结构学习和细胞类型发现,其中细胞的特征是在细胞类型的指导下提取的,从而提高了算法的性能。总共使用了来自不同组织和生物体的 15 个 scRNA-seq 数据集,细胞数量从 56 到 63103 不等,以验证算法的性能,实验结果表明,scGDC 在各种度量标准(平均提高 25.51%)上明显优于 14 种最先进的方法,其中(罕见)细胞类型与细胞亲和图的拓扑结构显著相关。该模型和算法为 scRNA-seq 数据的分析提供了一种有效的策略(该软件使用 python 编写,并可在学术上免费使用 https://github.com/xkmaxidian/scGDC)。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验