Suppr超能文献

scLEGA:一种基于注意力的深度聚类方法,在单细胞 RNA-seq 数据中倾向于低表达基因。

scLEGA: an attention-based deep clustering method with a tendency for low expression of genes on single-cell RNA-seq data.

机构信息

Aulin College, Northeast Forestry University 150040, 26 Hexing Road, Xiangfang District, Harbin, China.

Key Laboratory of Hepatosplenic Surgery, Ministry of Education, Department of General Surgery, the First Affiliated Hospital of Harbin Medical University 150001, 23 Postal Street, Nangang District, Harbin, China.

出版信息

Brief Bioinform. 2024 Jul 25;25(5). doi: 10.1093/bib/bbae371.

Abstract

Single-cell RNA sequencing (scRNA-seq) enables the exploration of biological heterogeneity among different cell types within tissues at a resolution. Inferring cell types within tissues is foundational for downstream research. Most existing methods for cell type inference based on scRNA-seq data primarily utilize highly variable genes (HVGs) with higher expression levels as clustering features, overlooking the contribution of HVGs with lower expression levels. To address this, we have designed a novel cell type inference method for scRNA-seq data, termed scLEGA. scLEGA employs a novel zero-inflated negative binomial (ZINB) loss function that fully considers the contribution of genes with lower expression levels and combines two distinct scRNA-seq clustering strategies through a multi-head attention mechanism. It utilizes a low-expression optimized denoising autoencoder, based on the novel ZINB model, to extract low-dimensional features and handle dropout events, and a GCN-based graph autoencoder (GAE) that leverages neighbor information to guide dimensionality reduction. The iterative fusion of denoising and topological embedding in scLEGA facilitates the acquisition of cluster-friendly cell representations in the hidden embedding, where similar cells are brought closer together. Compared to 12 state-of-the-art cell type inference methods on 15 scRNA-seq datasets, scLEGA demonstrates superior performance in clustering accuracy, scalability, and stability. Our scLEGA model codes are freely available at https://github.com/Masonze/scLEGA-main.

摘要

单细胞 RNA 测序 (scRNA-seq) 能够在分辨率上探索组织内不同细胞类型之间的生物学异质性。推断组织内的细胞类型是下游研究的基础。大多数基于 scRNA-seq 数据的细胞类型推断方法主要利用表达水平较高的高度可变基因 (HVGs) 作为聚类特征,而忽略了表达水平较低的 HVGs 的贡献。为了解决这个问题,我们设计了一种新的 scRNA-seq 数据细胞类型推断方法,称为 scLEGA。scLEGA 采用了一种新颖的零膨胀负二项式 (ZINB) 损失函数,充分考虑了低表达基因的贡献,并通过多头注意力机制将两种不同的 scRNA-seq 聚类策略结合在一起。它利用基于新型 ZINB 模型的低表达优化去噪自动编码器来提取低维特征并处理缺失事件,以及基于 GCN 的图自动编码器 (GAE),利用邻居信息来指导降维。scLEGA 中去噪和拓扑嵌入的迭代融合有助于在隐藏嵌入中获得聚类友好的细胞表示,其中相似的细胞被更紧密地聚集在一起。在 15 个 scRNA-seq 数据集上,与 12 种最先进的细胞类型推断方法相比,scLEGA 在聚类准确性、可扩展性和稳定性方面表现出优越的性能。我们的 scLEGA 模型代码可在 https://github.com/Masonze/scLEGA-main 上免费获取。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/59da/11281828/325af17a9aa5/bbae371f1.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验