• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

scGAD:用于广义细胞类型注释和发现的新任务和端到端框架。

scGAD: a new task and end-to-end framework for generalized cell type annotation and discovery.

机构信息

School of Mathematical Sciences, Peking University, Beijing, China.

Huawei Technologies Co., Ltd., Beijing, China.

出版信息

Brief Bioinform. 2023 Mar 19;24(2). doi: 10.1093/bib/bbad045.

DOI:10.1093/bib/bbad045
PMID:36869836
Abstract

The rapid development of single-cell RNA sequencing (scRNA-seq) technology allows us to study gene expression heterogeneity at the cellular level. Cell annotation is the basis for subsequent downstream analysis in single-cell data mining. As more and more well-annotated scRNA-seq reference data become available, many automatic annotation methods have sprung up in order to simplify the cell annotation process on unlabeled target data. However, existing methods rarely explore the fine-grained semantic knowledge of novel cell types absent from the reference data, and they are usually susceptible to batch effects on the classification of seen cell types. Taking into consideration the limitations above, this paper proposes a new and practical task called generalized cell type annotation and discovery for scRNA-seq data whereby target cells are labeled with either seen cell types or cluster labels, instead of a unified 'unassigned' label. To accomplish this, we carefully design a comprehensive evaluation benchmark and propose a novel end-to-end algorithmic framework called scGAD. Specifically, scGAD first builds the intrinsic correspondences on seen and novel cell types by retrieving geometrically and semantically mutual nearest neighbors as anchor pairs. Together with the similarity affinity score, a soft anchor-based self-supervised learning module is then designed to transfer the known label information from reference data to target data and aggregate the new semantic knowledge within target data in the prediction space. To enhance the inter-type separation and intra-type compactness, we further propose a confidential prototype self-supervised learning paradigm to implicitly capture the global topological structure of cells in the embedding space. Such a bidirectional dual alignment mechanism between embedding space and prediction space can better handle batch effect and cell type shift. Extensive results on massive simulation datasets and real datasets demonstrate the superiority of scGAD over various state-of-the-art clustering and annotation methods. We also implement marker gene identification to validate the effectiveness of scGAD in clustering novel cell types and their biological significance. To the best of our knowledge, we are the first to introduce this new and practical task and propose an end-to-end algorithmic framework to solve it. Our method scGAD is implemented in Python using the Pytorch machine-learning library, and it is freely available at https://github.com/aimeeyaoyao/scGAD.

摘要

单细胞 RNA 测序 (scRNA-seq) 技术的快速发展使我们能够在细胞水平上研究基因表达的异质性。细胞注释是单细胞数据挖掘后续下游分析的基础。随着越来越多注释良好的 scRNA-seq 参考数据的出现,许多自动注释方法应运而生,以简化对未标记目标数据的细胞注释过程。然而,现有的方法很少探索新颖细胞类型的细粒度语义知识,并且它们通常容易受到参考数据中未见细胞类型分类的批次效应的影响。考虑到上述限制,本文提出了一个新的实用任务,称为 scRNA-seq 数据的广义细胞类型注释和发现,其中目标细胞被标记为可见细胞类型或簇标签,而不是统一的“未分配”标签。为此,我们精心设计了一个全面的评估基准,并提出了一种名为 scGAD 的新颖端到端算法框架。具体来说,scGAD 首先通过检索几何和语义上相互最近邻作为锚对,在可见和新颖细胞类型之间建立内在对应关系。然后,结合相似性亲和得分,设计了一个软锚点自监督学习模块,将来自参考数据的已知标签信息从参考数据转移到目标数据,并在预测空间中聚合目标数据中的新语义知识。为了增强类型间的分离和类型内的紧凑性,我们进一步提出了一种保密原型自监督学习范例,以隐式捕获嵌入空间中细胞的全局拓扑结构。这种嵌入空间和预测空间之间的双向双重对齐机制可以更好地处理批次效应和细胞类型偏移。在大量模拟数据集和真实数据集上的广泛结果表明,scGAD 优于各种最新的聚类和注释方法。我们还实施了标记基因识别,以验证 scGAD 在聚类新颖细胞类型及其生物学意义方面的有效性。据我们所知,我们是第一个引入这个新的实用任务并提出端到端算法框架来解决它的人。我们的方法 scGAD 是使用 Pytorch 机器学习库在 Python 中实现的,可在 https://github.com/aimeeyaoyao/scGAD 上免费获得。

相似文献

1
scGAD: a new task and end-to-end framework for generalized cell type annotation and discovery.scGAD:用于广义细胞类型注释和发现的新任务和端到端框架。
Brief Bioinform. 2023 Mar 19;24(2). doi: 10.1093/bib/bbad045.
2
scBOL: a universal cell type identification framework for single-cell and spatial transcriptomics data.scBOL:单细胞和空间转录组学数据的通用细胞类型识别框架。
Brief Bioinform. 2024 Mar 27;25(3). doi: 10.1093/bib/bbae188.
3
Integrating Deep Supervised, Self-Supervised and Unsupervised Learning for Single-Cell RNA-seq Clustering and Annotation.将深度监督学习、自监督学习和无监督学习相结合进行单细胞 RNA-seq 聚类和注释。
Genes (Basel). 2020 Jul 14;11(7):792. doi: 10.3390/genes11070792.
4
Single-cell RNA-seq data semi-supervised clustering and annotation via structural regularized domain adaptation.基于结构正则化领域自适应的单细胞 RNA-seq 数据半监督聚类和注释。
Bioinformatics. 2021 May 5;37(6):775-784. doi: 10.1093/bioinformatics/btaa908.
5
scNAME: neighborhood contrastive clustering with ancillary mask estimation for scRNA-seq data.scNAME:基于辅助掩模估计的 scRNA-seq 数据邻域对比聚类。
Bioinformatics. 2022 Mar 4;38(6):1575-1583. doi: 10.1093/bioinformatics/btac011.
6
Deep enhanced constraint clustering based on contrastive learning for scRNA-seq data.基于对比学习的深度增强约束聚类算法在单细胞 RNA-seq 数据分析中的应用。
Brief Bioinform. 2023 Jul 20;24(4). doi: 10.1093/bib/bbad222.
7
scEVOLVE: cell-type incremental annotation without forgetting for single-cell RNA-seq data.scEVOLVE:单细胞 RNA-seq 数据的细胞类型增量注释而不忘却。
Brief Bioinform. 2024 Jan 22;25(2). doi: 10.1093/bib/bbae039.
8
TripletCell: a deep metric learning framework for accurate annotation of cell types at the single-cell level.三重细胞:一种用于单细胞水平准确注释细胞类型的深度度量学习框架。
Brief Bioinform. 2023 May 19;24(3). doi: 10.1093/bib/bbad132.
9
scEMAIL: Universal and Source-free Annotation Method for scRNA-seq Data with Novel Cell-type Perception.scEMAIL:一种具有新型细胞感知能力的 scRNA-seq 数据的通用且无来源注释方法。
Genomics Proteomics Bioinformatics. 2022 Oct;20(5):939-958. doi: 10.1016/j.gpb.2022.12.008. Epub 2023 Jan 3.
10
Learning deep features and topological structure of cells for clustering of scRNA-sequencing data.学习 scRNA-seq 数据聚类的细胞深度特征和拓扑结构。
Brief Bioinform. 2022 May 13;23(3). doi: 10.1093/bib/bbac068.

引用本文的文献

1
scRDAN: a robust domain adaptation network for cell type annotation across single-cell RNA sequencing data.scRDAN:一种用于跨单细胞RNA测序数据进行细胞类型注释的稳健域适应网络。
Brief Bioinform. 2025 Jul 2;26(4). doi: 10.1093/bib/bbaf344.
2
An overview of computational methods in single-cell transcriptomic cell type annotation.单细胞转录组细胞类型注释中的计算方法概述。
Brief Bioinform. 2025 May 1;26(3). doi: 10.1093/bib/bbaf207.
3
A gene regulatory network-aware graph learning method for cell identity annotation in single-cell RNA-seq data.
基于基因调控网络的图学习方法在单细胞 RNA-seq 数据中的细胞身份注释
Genome Res. 2024 Aug 20;34(7):1036-1051. doi: 10.1101/gr.278439.123.
4
scBOL: a universal cell type identification framework for single-cell and spatial transcriptomics data.scBOL:单细胞和空间转录组学数据的通用细胞类型识别框架。
Brief Bioinform. 2024 Mar 27;25(3). doi: 10.1093/bib/bbae188.
5
scEVOLVE: cell-type incremental annotation without forgetting for single-cell RNA-seq data.scEVOLVE:单细胞 RNA-seq 数据的细胞类型增量注释而不忘却。
Brief Bioinform. 2024 Jan 22;25(2). doi: 10.1093/bib/bbae039.