• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

scTab:缩放跨组织单细胞注释模型。

scTab: Scaling cross-tissue single-cell annotation models.

机构信息

Department of Computational Health, Institute of Computational Biology, Helmholtz, Munich, Germany.

School of Computing, Information and Technology, Technical University of Munich, Munich, Germany.

出版信息

Nat Commun. 2024 Aug 4;15(1):6611. doi: 10.1038/s41467-024-51059-5.

DOI:10.1038/s41467-024-51059-5
PMID:39098889
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC11298532/
Abstract

Identifying cellular identities is a key use case in single-cell transcriptomics. While machine learning has been leveraged to automate cell annotation predictions for some time, there has been little progress in scaling neural networks to large data sets and in constructing models that generalize well across diverse tissues. Here, we propose scTab, an automated cell type prediction model specific to tabular data, and train it using a novel data augmentation scheme across a large corpus of single-cell RNA-seq observations (22.2 million cells). In this context, we show that cross-tissue annotation requires nonlinear models and that the performance of scTab scales both in terms of training dataset size and model size. Additionally, we show that the proposed data augmentation schema improves model generalization. In summary, we introduce a de novo cell type prediction model for single-cell RNA-seq data that can be trained across a large-scale collection of curated datasets and demonstrate the benefits of using deep learning methods in this paradigm.

摘要

鉴定细胞身份是单细胞转录组学的一个关键应用。虽然机器学习已经被用于自动化细胞注释预测一段时间了,但在将神经网络扩展到大数据集和构建能够很好地跨多种组织概括的模型方面,几乎没有取得什么进展。在这里,我们提出了 scTab,这是一种专门针对表格数据的自动化细胞类型预测模型,并使用一种新的数据增强方案在一个包含大量单细胞 RNA-seq 观测值(2220 万细胞)的语料库中对其进行训练。在这种情况下,我们表明跨组织注释需要非线性模型,并且 scTab 的性能在训练数据集大小和模型大小方面都有所扩展。此外,我们表明所提出的数据增强方案提高了模型的泛化能力。总之,我们为单细胞 RNA-seq 数据引入了一种新的细胞类型预测模型,该模型可以在大规模的经过整理的数据集集合上进行训练,并展示了在这种范例中使用深度学习方法的好处。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/3770/11298532/d3e243899014/41467_2024_51059_Fig3_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/3770/11298532/b5560b2f128e/41467_2024_51059_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/3770/11298532/1c6f791b4685/41467_2024_51059_Fig2_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/3770/11298532/d3e243899014/41467_2024_51059_Fig3_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/3770/11298532/b5560b2f128e/41467_2024_51059_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/3770/11298532/1c6f791b4685/41467_2024_51059_Fig2_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/3770/11298532/d3e243899014/41467_2024_51059_Fig3_HTML.jpg

相似文献

1
scTab: Scaling cross-tissue single-cell annotation models.scTab:缩放跨组织单细胞注释模型。
Nat Commun. 2024 Aug 4;15(1):6611. doi: 10.1038/s41467-024-51059-5.
2
Scaling cross-tissue single-cell annotation models.扩展跨组织单细胞注释模型。
bioRxiv. 2023 Oct 10:2023.10.07.561331. doi: 10.1101/2023.10.07.561331.
3
NNICE: a deep quantile neural network algorithm for expression deconvolution.NNICE:一种用于表达解卷积的深度分位数神经网络算法。
Sci Rep. 2024 Jun 18;14(1):14040. doi: 10.1038/s41598-024-65053-w.
4
scPLAN: a hierarchical computational framework for single transcriptomics data annotation, integration and cell-type label refinement.scPLAN:一种用于单细胞转录组学数据注释、整合和细胞类型标签细化的分层计算框架。
Brief Bioinform. 2024 May 23;25(4). doi: 10.1093/bib/bbae305.
5
A hybrid deep clustering approach for robust cell type profiling using single-cell RNA-seq data.基于单细胞 RNA-seq 数据的混合深度聚类方法进行稳健的细胞类型分析。
RNA. 2020 Oct;26(10):1303-1319. doi: 10.1261/rna.074427.119. Epub 2020 Jun 12.
6
Deep learning tackles single-cell analysis-a survey of deep learning for scRNA-seq analysis.深度学习应对单细胞分析——深度学习在 scRNA-seq 分析中的应用综述。
Brief Bioinform. 2022 Jan 17;23(1). doi: 10.1093/bib/bbab531.
7
Integrating Deep Supervised, Self-Supervised and Unsupervised Learning for Single-Cell RNA-seq Clustering and Annotation.将深度监督学习、自监督学习和无监督学习相结合进行单细胞 RNA-seq 聚类和注释。
Genes (Basel). 2020 Jul 14;11(7):792. doi: 10.3390/genes11070792.
8
SFINN: inferring gene regulatory network from single-cell and spatial transcriptomic data with shared factor neighborhood and integrated neural network.SFINN:利用共享因子邻域和集成神经网络从单细胞和空间转录组数据推断基因调控网络。
Bioinformatics. 2024 Jul 1;40(7). doi: 10.1093/bioinformatics/btae433.
9
Deep learning of gene relationships from single cell time-course expression data.从单细胞时间序列表达数据中深度学习基因关系。
Brief Bioinform. 2021 Sep 2;22(5). doi: 10.1093/bib/bbab142.
10
A comparison of RNA-Seq data preprocessing pipelines for transcriptomic predictions across independent studies.比较 RNA-Seq 数据预处理管道,以跨独立研究进行转录组预测。
BMC Bioinformatics. 2024 May 8;25(1):181. doi: 10.1186/s12859-024-05801-x.

引用本文的文献

1
BioLLM: A standardized framework for integrating and benchmarking single-cell foundation models.BioLLM:一个用于整合和基准测试单细胞基础模型的标准化框架。
Patterns (N Y). 2025 Jul 30;6(8):101326. doi: 10.1016/j.patter.2025.101326. eCollection 2025 Aug 8.
2
The Cell Ontology in the age of single-cell omics.单细胞组学时代的细胞本体论。
ArXiv. 2025 Jun 17:arXiv:2506.10037v2.
3
scValue: value-based subsampling of large-scale single-cell transcriptomic data for machine and deep learning tasks.scValue:用于机器学习和深度学习任务的大规模单细胞转录组数据的基于值的二次采样。

本文引用的文献

1
Large-scale foundation model on single-cell transcriptomics.单细胞转录组学的大规模基础模型。
Nat Methods. 2024 Aug;21(8):1481-1491. doi: 10.1038/s41592-024-02305-7. Epub 2024 Jun 6.
2
scGPT: toward building a foundation model for single-cell multi-omics using generative AI.scGPT:迈向使用生成式人工智能构建单细胞多组学基础模型
Nat Methods. 2024 Aug;21(8):1470-1480. doi: 10.1038/s41592-024-02201-0. Epub 2024 Feb 26.
3
Population-level integration of single-cell datasets enables multi-scale analysis across samples.单细胞数据集的群体水平整合能够实现跨样本的多尺度分析。
Brief Bioinform. 2025 May 1;26(3). doi: 10.1093/bib/bbaf279.
4
MINGLE: a mutual information-based interpretable framework for automatic cell type annotation in single-cell chromatin accessibility data.MINGLE:一种基于互信息的可解释框架,用于单细胞染色质可及性数据中的自动细胞类型注释。
Genome Biol. 2025 Jun 11;26(1):162. doi: 10.1186/s13059-025-03603-9.
5
Evaluation of out-of-distribution detection methods for data shifts in single-cell transcriptomics.单细胞转录组学中数据偏移的分布外检测方法评估
Brief Bioinform. 2025 May 1;26(3). doi: 10.1093/bib/bbaf239.
6
An overview of computational methods in single-cell transcriptomic cell type annotation.单细胞转录组细胞类型注释中的计算方法概述。
Brief Bioinform. 2025 May 1;26(3). doi: 10.1093/bib/bbaf207.
7
Harnessing the Power of Single-Cell Large Language Models with Parameter Efficient Fine-Tuning using scPEFT.利用scPEFT通过参数高效微调来发挥单细胞大语言模型的强大作用。
Res Sq. 2025 Apr 25:rs.3.rs-5926885. doi: 10.21203/rs.3.rs-5926885/v1.
8
Consequences of training data composition for deep learning models in single-cell biology.单细胞生物学中深度学习模型训练数据构成的影响
bioRxiv. 2025 Feb 24:2025.02.19.639127. doi: 10.1101/2025.02.19.639127.
9
Gene expression patterns of the developing human face at single cell resolution reveal cell type contributions to normal facial variation and disease risk.单细胞分辨率下发育中人类面部的基因表达模式揭示了细胞类型对正常面部变异和疾病风险的影响。
bioRxiv. 2025 Feb 5:2025.01.18.633396. doi: 10.1101/2025.01.18.633396.
10
SwarmMAP: Swarm Learning for Decentralized Cell Type Annotation in Single Cell Sequencing Data.SwarmMAP:用于单细胞测序数据中分散细胞类型注释的群体学习
bioRxiv. 2025 Jan 16:2025.01.13.632775. doi: 10.1101/2025.01.13.632775.
Nat Methods. 2023 Nov;20(11):1683-1692. doi: 10.1038/s41592-023-02035-2. Epub 2023 Oct 9.
4
An integrated cell atlas of the lung in health and disease.肺部健康与疾病的细胞整合图谱
Nat Med. 2023 Jun;29(6):1563-1577. doi: 10.1038/s41591-023-02327-2. Epub 2023 Jun 8.
5
A comprehensive mouse kidney atlas enables rare cell population characterization and robust marker discovery.一份全面的小鼠肾脏图谱有助于对稀有细胞群体进行表征并发现可靠的标志物。
iScience. 2023 May 18;26(6):106877. doi: 10.1016/j.isci.2023.106877. eCollection 2023 Jun 16.
6
Transfer learning enables predictions in network biology.迁移学习可实现网络生物学预测。
Nature. 2023 Jun;618(7965):616-624. doi: 10.1038/s41586-023-06139-9. Epub 2023 May 31.
7
CIForm as a Transformer-based model for cell-type annotation of large-scale single-cell RNA-seq data.CIForm 作为一种基于 Transformer 的模型,用于大规模单细胞 RNA-seq 数据的细胞类型注释。
Brief Bioinform. 2023 Jul 20;24(4). doi: 10.1093/bib/bbad195.
8
Best practices for single-cell analysis across modalities.多模态单细胞分析的最佳实践。
Nat Rev Genet. 2023 Aug;24(8):550-572. doi: 10.1038/s41576-023-00586-w. Epub 2023 Mar 31.
9
Cross-tissue immune cell analysis reveals tissue-specific features in humans.跨组织免疫细胞分析揭示人类组织特异性特征。
Science. 2022 May 13;376(6594):eabl5197. doi: 10.1126/science.abl5197.
10
Benchmarking atlas-level data integration in single-cell genomics.单细胞基因组学中图谱级数据整合的基准测试。
Nat Methods. 2022 Jan;19(1):41-50. doi: 10.1038/s41592-021-01336-8. Epub 2021 Dec 23.