Suppr超能文献

scSwinFormer:一种基于 Transformer 的单细胞 RNA-Seq 数据细胞类型注释方法,使用平滑基因嵌入和全局特征。

scSwinFormer: A Transformer-Based Cell-Type Annotation Method for scRNA-Seq Data Using Smooth Gene Embedding and Global Features.

机构信息

School of Information and Electronics, Beijing Institute of Technology, Beijing 100081, China.

出版信息

J Chem Inf Model. 2024 Aug 26;64(16):6316-6323. doi: 10.1021/acs.jcim.4c00616. Epub 2024 Aug 5.

Abstract

Single-cell omics techniques have made it possible to analyze individual cells in biological samples, providing us with a more detailed understanding of cellular heterogeneity and biological systems. Accurate identification of cell types is critical for single-cell RNA sequencing (scRNA-seq) analysis. However, scRNA-seq data are usually high dimensional and sparse, posing a great challenge to analyze scRNA-seq data. Existing cell-type annotation methods are either constrained in modeling scRNA-seq data or lack consideration of long-term dependencies of characterized genes. In this work, we developed a Transformer-based deep learning method, scSwinFormer, for the cell-type annotation of large-scale scRNA-seq data. Sequence modeling of scRNA-seq data is performed using the smooth gene embedding module, and then, the potential dependencies of genes are captured by the self-attention module. Subsequently, the global information inherent in scRNA-seq data is synthesized using the Cell Token, thereby facilitating accurate cell-type annotation. We evaluated the performance of our model against current state-of-the-art scRNA-seq cell-type annotation methods on multiple real data sets. ScSwinFormer outperforms the current state-of-the-art scRNA-seq cell-type annotation methods in both external and benchmark data set experiments.

摘要

单细胞组学技术使得分析生物样本中的单个细胞成为可能,为我们提供了对细胞异质性和生物系统的更详细的理解。准确识别细胞类型是单细胞 RNA 测序 (scRNA-seq) 分析的关键。然而,scRNA-seq 数据通常具有高维性和稀疏性,这对分析 scRNA-seq 数据构成了巨大挑战。现有的细胞类型注释方法要么在建模 scRNA-seq 数据方面受到限制,要么缺乏对特征基因的长期依赖性的考虑。在这项工作中,我们开发了一种基于 Transformer 的深度学习方法 scSwinFormer,用于大规模 scRNA-seq 数据的细胞类型注释。使用平滑基因嵌入模块对 scRNA-seq 数据进行序列建模,然后使用自注意力模块捕获基因的潜在依赖性。随后,使用 Cell Token 合成 scRNA-seq 数据中的全局信息,从而实现准确的细胞类型注释。我们在多个真实数据集上评估了我们的模型相对于当前最先进的 scRNA-seq 细胞类型注释方法的性能。scSwinFormer 在外部和基准数据集实验中均优于当前最先进的 scRNA-seq 细胞类型注释方法。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验