• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

scPretrain:用于细胞类型分类的多任务自监督学习

scPretrain: multi-task self-supervised learning for cell-type classification.

作者信息

Zhang Ruiyi, Luo Yunan, Ma Jianzhu, Zhang Ming, Wang Sheng

机构信息

School of EECS, Peking University, Beijing, China.

Department of Computer Science, University of Illinois at Urbana-Champaign, Urbana, IL, USA.

出版信息

Bioinformatics. 2022 Mar 4;38(6):1607-1614. doi: 10.1093/bioinformatics/btac007.

DOI:10.1093/bioinformatics/btac007
PMID:34999749
Abstract

MOTIVATION

Rapidly generated scRNA-seq datasets enable us to understand cellular differences and the function of each individual cell at single-cell resolution. Cell-type classification, which aims at characterizing and labeling groups of cells according to their gene expression, is one of the most important steps for single-cell analysis. To facilitate the manual curation process, supervised learning methods have been used to automatically classify cells. Most of the existing supervised learning approaches only utilize annotated cells in the training step while ignoring the more abundant unannotated cells. In this article, we proposed scPretrain, a multi-task self-supervised learning approach that jointly considers annotated and unannotated cells for cell-type classification. scPretrain consists of a pre-training step and a fine-tuning step. In the pre-training step, scPretrain uses a multi-task learning framework to train a feature extraction encoder based on each dataset's pseudo-labels, where only unannotated cells are used. In the fine-tuning step, scPretrain fine-tunes this feature extraction encoder using the limited annotated cells in a new dataset.

RESULTS

We evaluated scPretrain on 60 diverse datasets from different technologies, species and organs, and obtained a significant improvement on both cell-type classification and cell clustering. Moreover, the representations obtained by scPretrain in the pre-training step also enhanced the performance of conventional classifiers, such as random forest, logistic regression and support-vector machines. scPretrain is able to effectively utilize the massive amount of unlabeled data and be applied to annotating increasingly generated scRNA-seq datasets.

AVAILABILITY AND IMPLEMENTATION

The data and code underlying this article are available in scPretrain: Multi-task self-supervised learning for cell type classification, at https://github.com/ruiyi-zhang/scPretrain and https://zenodo.org/record/5802306.

SUPPLEMENTARY INFORMATION

Supplementary data are available at Bioinformatics online.

摘要

动机

快速生成的单细胞RNA测序(scRNA-seq)数据集使我们能够在单细胞分辨率下了解细胞差异和每个细胞的功能。细胞类型分类旨在根据细胞的基因表达对细胞群体进行表征和标记,是单细胞分析中最重要的步骤之一。为了便于人工整理过程,监督学习方法已被用于自动对细胞进行分类。现有的大多数监督学习方法在训练步骤中只利用有注释的细胞,而忽略了更丰富的无注释细胞。在本文中,我们提出了scPretrain,这是一种多任务自监督学习方法,在细胞类型分类中联合考虑有注释和无注释的细胞。scPretrain由预训练步骤和微调步骤组成。在预训练步骤中,scPretrain使用多任务学习框架基于每个数据集的伪标签训练一个特征提取编码器,其中只使用无注释的细胞。在微调步骤中,scPretrain使用新数据集中有限的有注释细胞对该特征提取编码器进行微调。

结果

我们在来自不同技术、物种和器官的60个不同数据集上评估了scPretrain,在细胞类型分类和细胞聚类方面都取得了显著改进。此外,scPretrain在预训练步骤中获得的表征也提高了传统分类器(如随机森林、逻辑回归和支持向量机)的性能。scPretrain能够有效利用大量未标记数据,并应用于注释越来越多生成的scRNA-seq数据集。

可用性和实现

本文的基础数据和代码可在scPretrain:用于细胞类型分类的多任务自监督学习中获取,网址为https://github.com/ruiyi-zhang/scPretrain和https://zenodo.org/record/5802306。

补充信息

补充数据可在《生物信息学》在线获取。

相似文献

1
scPretrain: multi-task self-supervised learning for cell-type classification.scPretrain:用于细胞类型分类的多任务自监督学习
Bioinformatics. 2022 Mar 4;38(6):1607-1614. doi: 10.1093/bioinformatics/btac007.
2
A machine learning-based method for automatically identifying novel cells in annotating single-cell RNA-seq data.基于机器学习的方法,用于自动识别注释单细胞 RNA-seq 数据中的新型细胞。
Bioinformatics. 2022 Oct 31;38(21):4885-4892. doi: 10.1093/bioinformatics/btac617.
3
scCNC: a method based on capsule network for clustering scRNA-seq data.scCNC:一种基于胶囊网络的 scRNA-seq 数据聚类方法。
Bioinformatics. 2022 Aug 2;38(15):3703-3709. doi: 10.1093/bioinformatics/btac393.
4
scGAD: a new task and end-to-end framework for generalized cell type annotation and discovery.scGAD:用于广义细胞类型注释和发现的新任务和端到端框架。
Brief Bioinform. 2023 Mar 19;24(2). doi: 10.1093/bib/bbad045.
5
Continually adapting pre-trained language model to universal annotation of single-cell RNA-seq data.持续调整预先训练的语言模型,以实现单细胞 RNA-seq 数据的通用注释。
Brief Bioinform. 2024 Jan 22;25(2). doi: 10.1093/bib/bbae047.
6
Single-cell RNA-seq data semi-supervised clustering and annotation via structural regularized domain adaptation.基于结构正则化领域自适应的单细胞 RNA-seq 数据半监督聚类和注释。
Bioinformatics. 2021 May 5;37(6):775-784. doi: 10.1093/bioinformatics/btaa908.
7
CALLR: a semi-supervised cell-type annotation method for single-cell RNA sequencing data.CALLR:一种用于单细胞 RNA 测序数据的半监督细胞类型注释方法。
Bioinformatics. 2021 Jul 12;37(Suppl_1):i51-i58. doi: 10.1093/bioinformatics/btab286.
8
scNAME: neighborhood contrastive clustering with ancillary mask estimation for scRNA-seq data.scNAME:基于辅助掩模估计的 scRNA-seq 数据邻域对比聚类。
Bioinformatics. 2022 Mar 4;38(6):1575-1583. doi: 10.1093/bioinformatics/btac011.
9
netAE: semi-supervised dimensionality reduction of single-cell RNA sequencing to facilitate cell labeling.netAE:单细胞 RNA 测序的半监督降维以促进细胞标记。
Bioinformatics. 2021 Apr 9;37(1):43-49. doi: 10.1093/bioinformatics/btaa669.
10
A Contrastive Predictive Coding-Based Classification Framework for Healthcare Sensor Data.基于对比预测编码的医疗传感器数据分类框架。
J Healthc Eng. 2022 Mar 15;2022:5649253. doi: 10.1155/2022/5649253. eCollection 2022.

引用本文的文献

1
Mapping Cell Identity from scRNA-seq: A primer on computational methods.从单细胞RNA测序映射细胞身份:计算方法入门
Comput Struct Biotechnol J. 2025 Apr 2;27:1559-1569. doi: 10.1016/j.csbj.2025.03.051. eCollection 2025.
2
Self-Supervised Graph Representation Learning for Single-Cell Classification.用于单细胞分类的自监督图表示学习
Interdiscip Sci. 2025 Apr 3. doi: 10.1007/s12539-025-00700-y.
3
Spatial-Omics Methods and Applications.空间组学方法与应用
Methods Mol Biol. 2025;2880:101-146. doi: 10.1007/978-1-0716-4276-4_5.
4
Profiling cell identity and tissue architecture with single-cell and spatial transcriptomics.利用单细胞和空间转录组学分析细胞身份和组织结构。
Nat Rev Mol Cell Biol. 2025 Jan;26(1):11-31. doi: 10.1038/s41580-024-00768-2. Epub 2024 Aug 21.
5
Large-scale foundation model on single-cell transcriptomics.单细胞转录组学的大规模基础模型。
Nat Methods. 2024 Aug;21(8):1481-1491. doi: 10.1038/s41592-024-02305-7. Epub 2024 Jun 6.
6
A scalable sparse neural network framework for rare cell type annotation of single-cell transcriptome data.用于单细胞转录组数据稀有细胞类型注释的可扩展稀疏神经网络框架。
Commun Biol. 2023 May 20;6(1):545. doi: 10.1038/s42003-023-04928-6.
7
Hierarchical cell-type identifier accurately distinguishes immune-cell subtypes enabling precise profiling of tissue microenvironment with single-cell RNA-sequencing.层次化细胞类型标识符能够准确地区分免疫细胞亚型,从而能够通过单细胞 RNA 测序对组织微环境进行精确分析。
Brief Bioinform. 2023 Mar 19;24(2). doi: 10.1093/bib/bbad006.
8
Automatic cell type identification methods for single-cell RNA sequencing.用于单细胞RNA测序的自动细胞类型识别方法。
Comput Struct Biotechnol J. 2021 Oct 20;19:5874-5887. doi: 10.1016/j.csbj.2021.10.027. eCollection 2021.