CASTLe - 通过迁移学习对单细胞进行分类：利用公开的单细胞 RNA 测序实验的力量来注释新的实验。

CaSTLe - Classification of single cells by transfer learning: Harnessing the power of publicly available single cell RNA sequencing experiments to annotate new experiments.

机构信息

Department of Life Sciences, Ben-Gurion University of the Negev, Beer-Sheva, Israel.

Department of Software and Information Systems Engineering, Ben-Gurion University of the Negev, Beer-Sheva, Israel.

出版信息

PLoS One. 2018 Oct 10;13(10):e0205499. doi: 10.1371/journal.pone.0205499. eCollection 2018.

DOI:10.1371/journal.pone.0205499

PMID:30304022

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC6179251/

Abstract

Single-cell RNA sequencing (scRNA-seq) is an emerging technology for profiling the gene expression of thousands of cells at the single cell resolution. Currently, the labeling of cells in an scRNA-seq dataset is performed by manually characterizing clusters of cells or by fluorescence-activated cell sorting (FACS). Both methods have inherent drawbacks: The first depends on the clustering algorithm used and the knowledge and arbitrary decisions of the annotator, and the second involves an experimental step in addition to the sequencing and cannot be incorporated into the higher throughput scRNA-seq methods. We therefore suggest a different approach for cell labeling, namely, classifying cells from scRNA-seq datasets by using a model transferred from different (previously labeled) datasets. This approach can complement existing methods, and-in some cases-even replace them. Such a transfer-learning framework requires selecting informative features and training a classifier. The specific implementation for the framework that we propose, designated ''CaSTLe-classification of single cells by transfer learning,'' is based on a robust feature engineering workflow and an XGBoost classification model built on these features. Evaluation of CaSTLe against two benchmark feature-selection and classification methods showed that it outperformed the benchmark methods in most cases and yielded satisfactory classification accuracy in a consistent manner. CaSTLe has the additional advantage of being parallelizable and well suited to large datasets. We showed that it was possible to classify cell types using transfer learning, even when the databases contained a very small number of genes, and our study thus indicates the potential applicability of this approach for analysis of scRNA-seq datasets.

摘要

单细胞 RNA 测序 (scRNA-seq) 是一种新兴的技术，可在单细胞分辨率下对数千个细胞的基因表达进行分析。目前，scRNA-seq 数据集中的细胞标记是通过手动特征化细胞簇或通过荧光激活细胞分选 (FACS) 来完成的。这两种方法都有其内在的缺点：第一种方法依赖于所使用的聚类算法以及注释者的知识和任意决策，第二种方法除了测序之外还涉及一个实验步骤，并且不能纳入更高通量的 scRNA-seq 方法中。因此，我们建议使用不同的方法来进行细胞标记，即通过使用从不同（先前标记）数据集转移过来的模型来对 scRNA-seq 数据集中的细胞进行分类。这种方法可以补充现有的方法，并且在某些情况下甚至可以替代它们。这种迁移学习框架需要选择信息丰富的特征并训练分类器。我们提出的“基于迁移学习的单细胞分类”（CaSTLe-classification of single cells by transfer learning）框架的具体实现是基于稳健的特征工程工作流程和基于这些特征构建的 XGBoost 分类模型。将 CaSTLe 与两种基准特征选择和分类方法进行评估的结果表明，它在大多数情况下都优于基准方法，并且以一致的方式产生了令人满意的分类准确性。CaSTLe 具有可并行化的额外优势，非常适合大型数据集。我们表明，即使在数据库包含非常少的基因的情况下，也可以使用迁移学习来对细胞类型进行分类，因此本研究表明了这种方法对 scRNA-seq 数据集分析的潜在适用性。

相似文献

CaSTLe - Classification of single cells by transfer learning: Harnessing the power of publicly available single cell RNA sequencing experiments to annotate new experiments.CASTLe - 通过迁移学习对单细胞进行分类：利用公开的单细胞 RNA 测序实验的力量来注释新的实验。

PLoS One. 2018 Oct 10;13(10):e0205499. doi: 10.1371/journal.pone.0205499. eCollection 2018.

A multitask clustering approach for single-cell RNA-seq analysis in Recessive Dystrophic Epidermolysis Bullosa.一种用于隐性营养不良型大疱性表皮松解症的单细胞 RNA-seq 分析的多任务聚类方法。

PLoS Comput Biol. 2018 Apr 9;14(4):e1006053. doi: 10.1371/journal.pcbi.1006053. eCollection 2018 Apr.

On the use of QDE-SVM for gene feature selection and cell type classification from scRNA-seq data.基于 QDE-SVM 的 scRNA-seq 数据基因特征选择和细胞类型分类方法。

PLoS One. 2023 Oct 19;18(10):e0292961. doi: 10.1371/journal.pone.0292961. eCollection 2023.

Evaluation of single-cell classifiers for single-cell RNA sequencing data sets.单细胞分类器在单细胞 RNA 测序数据集上的评估。

Brief Bioinform. 2020 Sep 25;21(5):1581-1595. doi: 10.1093/bib/bbz096.

JingleBells: A Repository of Immune-Related Single-Cell RNA-Sequencing Datasets.《铃儿响叮当》：一个免疫相关单细胞RNA测序数据集的储存库。

J Immunol. 2017 May 1;198(9):3375-3379. doi: 10.4049/jimmunol.1700272.

A hybrid deep clustering approach for robust cell type profiling using single-cell RNA-seq data.基于单细胞 RNA-seq 数据的混合深度聚类方法进行稳健的细胞类型分析。

RNA. 2020 Oct;26(10):1303-1319. doi: 10.1261/rna.074427.119. Epub 2020 Jun 12.

Data Analysis in Single-Cell Transcriptome Sequencing.单细胞转录组测序中的数据分析

Methods Mol Biol. 2018;1754:311-326. doi: 10.1007/978-1-4939-7717-8_18.

DIMM-SC: a Dirichlet mixture model for clustering droplet-based single cell transcriptomic data.DIMM-SC：一种基于 Dirichlet 混合模型的用于聚类基于液滴的单细胞转录组学数据的方法。

Bioinformatics. 2018 Jan 1;34(1):139-146. doi: 10.1093/bioinformatics/btx490.

Random forest based similarity learning for single cell RNA sequencing data.基于随机森林的单细胞 RNA 测序数据相似性学习。

Bioinformatics. 2018 Jul 1;34(13):i79-i88. doi: 10.1093/bioinformatics/bty260.

Using transfer learning from prior reference knowledge to improve the clustering of single-cell RNA-Seq data.利用先验参考知识的迁移学习来改进单细胞 RNA-Seq 数据的聚类。

Sci Rep. 2019 Dec 30;9(1):20353. doi: 10.1038/s41598-019-56911-z.

引用本文的文献

scSorterDL: a deep neural network-enhanced ensemble LDAs for single cell classifications.scSorterDL：一种用于单细胞分类的深度神经网络增强集成线性判别分析方法。

Brief Bioinform. 2025 Aug 31;26(5). doi: 10.1093/bib/bbaf446.

HiCat: a semi-supervised approach for cell type annotation.HiCat：一种用于细胞类型注释的半监督方法。

Brief Bioinform. 2025 Jul 2;26(4). doi: 10.1093/bib/bbaf428.

Interpretable machine learning-guided single-cell mapping deciphers multi-lineage pancreatic dysregulation in type 2 diabetes.可解释的机器学习引导的单细胞图谱解析2型糖尿病中多谱系胰腺失调。

Cardiovasc Diabetol. 2025 Jul 24;24(1):300. doi: 10.1186/s12933-025-02865-8.

Out of distribution learning in bioinformatics: advancements and challenges.生物信息学中的分布外学习：进展与挑战

Brief Bioinform. 2025 May 1;26(3). doi: 10.1093/bib/bbaf294.

Mapping Cell Identity from scRNA-seq: A primer on computational methods.从单细胞RNA测序映射细胞身份：计算方法入门

Comput Struct Biotechnol J. 2025 Apr 2;27:1559-1569. doi: 10.1016/j.csbj.2025.03.051. eCollection 2025.

adverSCarial: assessing the vulnerability of single-cell RNA-sequencing classifiers to adversarial attacks.对抗性攻击：评估单细胞RNA测序分类器对对抗性攻击的脆弱性

Bioinformatics. 2025 Mar 29;41(4). doi: 10.1093/bioinformatics/btaf168.

scTrans: Sparse attention powers fast and accurate cell type annotation in single-cell RNA-seq data.scTrans：稀疏注意力助力单细胞RNA测序数据中快速且准确的细胞类型注释。

PLoS Comput Biol. 2025 Apr 4;21(4):e1012904. doi: 10.1371/journal.pcbi.1012904. eCollection 2025 Apr.

Mouse-Geneformer: A deep learning model for mouse single-cell transcriptome and its cross-species utility.小鼠基因Transformer：一种用于小鼠单细胞转录组的深度学习模型及其跨物种效用。

PLoS Genet. 2025 Mar 19;21(3):e1011420. doi: 10.1371/journal.pgen.1011420. eCollection 2025 Mar.

AnnoGCD: a generalized category discovery framework for automatic cell type annotation.AnnoGCD：用于自动细胞类型注释的通用类别发现框架。

NAR Genom Bioinform. 2024 Dec 4;6(4):lqae166. doi: 10.1093/nargab/lqae166. eCollection 2024 Dec.

eMCI: An Explainable Multimodal Correlation Integration Model for Unveiling Spatial Transcriptomics and Intercellular Signaling.eMCI：一种用于揭示空间转录组学和细胞间信号传导的可解释多模态关联整合模型

Research (Wash D C). 2024 Nov 1;7:0522. doi: 10.34133/research.0522. eCollection 2024.

本文引用的文献

Exploring the single-cell RNA-seq analysis landscape with the scRNA-tools database.探索单细胞 RNA-seq 分析图谱与 scRNA-tools 数据库。

PLoS Comput Biol. 2018 Jun 25;14(6):e1006245. doi: 10.1371/journal.pcbi.1006245. eCollection 2018 Jun.

CellAtlasSearch: a scalable search engine for single cells.细胞图谱搜索引擎：用于单细胞的可扩展搜索引擎。

Nucleic Acids Res. 2018 Jul 2;46(W1):W141-W147. doi: 10.1093/nar/gky421.

scmap: projection of single-cell RNA-seq data across data sets.scmap：跨数据集的单细胞 RNA-seq 数据的投影。

Nat Methods. 2018 May;15(5):359-362. doi: 10.1038/nmeth.4644. Epub 2018 Apr 2.

Characterizing the replicability of cell types defined by single cell RNA-sequencing data using MetaNeighbor.使用MetaNeighbor对单细胞RNA测序数据定义的细胞类型的可重复性进行表征。

Nat Commun. 2018 Feb 28;9(1):884. doi: 10.1038/s41467-018-03282-0.

The embryo at single-cell transcriptome resolution.单细胞转录组分辨率下的胚胎。

Science. 2017 Oct 13;358(6360):194-199. doi: 10.1126/science.aan3235. Epub 2017 Aug 31.

Challenges and emerging directions in single-cell analysis.单细胞分析中的挑战与新方向

Genome Biol. 2017 May 8;18(1):84. doi: 10.1186/s13059-017-1218-y.

JingleBells: A Repository of Immune-Related Single-Cell RNA-Sequencing Datasets.《铃儿响叮当》：一个免疫相关单细胞RNA测序数据集的储存库。

J Immunol. 2017 May 1;198(9):3375-3379. doi: 10.4049/jimmunol.1700272.

Single-Cell Transcriptomic Analysis Defines Heterogeneity and Transcriptional Dynamics in the Adult Neural Stem Cell Lineage.单细胞转录组分析定义了成体神经干细胞谱系中的异质性和转录动态。

Cell Rep. 2017 Jan 17;18(3):777-790. doi: 10.1016/j.celrep.2016.12.060.

Scater: pre-processing, quality control, normalization and visualization of single-cell RNA-seq data in R.Scater：R语言中单细胞RNA测序数据的预处理、质量控制、标准化和可视化

Bioinformatics. 2017 Apr 15;33(8):1179-1186. doi: 10.1093/bioinformatics/btw777.

Single-Cell Transcriptome Profiling of Human Pancreatic Islets in Health and Type 2 Diabetes.健康与2型糖尿病状态下人类胰岛的单细胞转录组分析

Cell Metab. 2016 Oct 11;24(4):593-607. doi: 10.1016/j.cmet.2016.08.020. Epub 2016 Sep 22.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

CASTLe - 通过迁移学习对单细胞进行分类：利用公开的单细胞 RNA 测序实验的力量来注释新的实验。

CaSTLe - Classification of single cells by transfer learning: Harnessing the power of publicly available single cell RNA sequencing experiments to annotate new experiments.

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献