• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

单细胞转录组学的大规模基础模型。

Large-scale foundation model on single-cell transcriptomics.

机构信息

MOE Key Laboratory of Bioinformatics and Bioinformatics Division, BNRIST, Department of Automation, Tsinghua University, Beijing, China.

BioMap, Beijing, China.

出版信息

Nat Methods. 2024 Aug;21(8):1481-1491. doi: 10.1038/s41592-024-02305-7. Epub 2024 Jun 6.

DOI:10.1038/s41592-024-02305-7
PMID:38844628
Abstract

Large pretrained models have become foundation models leading to breakthroughs in natural language processing and related fields. Developing foundation models for deciphering the 'languages' of cells and facilitating biomedical research is promising yet challenging. Here we developed a large pretrained model scFoundation, also named 'xTrimoscFoundation', with 100 million parameters covering about 20,000 genes, pretrained on over 50 million human single-cell transcriptomic profiles. scFoundation is a large-scale model in terms of the size of trainable parameters, dimensionality of genes and volume of training data. Its asymmetric transformer-like architecture and pretraining task design empower effectively capturing complex context relations among genes in a variety of cell types and states. Experiments showed its merit as a foundation model that achieved state-of-the-art performances in a diverse array of single-cell analysis tasks such as gene expression enhancement, tissue drug response prediction, single-cell drug response classification, single-cell perturbation prediction, cell type annotation and gene module inference.

摘要

大型预训练模型已经成为引领自然语言处理及相关领域取得突破的基础模型。开发用于破译细胞“语言”并促进生物医学研究的基础模型具有广阔的前景,但也极具挑战性。在此,我们开发了一个名为 xTrimoscFoundation 的大型预训练模型 scFoundation,它拥有 1 亿个参数,涵盖约 2 万个基因,在超过 5000 万个人类单细胞转录组图谱上进行了预训练。scFoundation 是一个在可训练参数大小、基因维度和训练数据量方面的大规模模型。其非对称的类 Transformer 结构和预训练任务设计,能够有效地捕捉各种细胞类型和状态下基因之间复杂的上下文关系。实验表明,scFoundation 作为一个基础模型具有优势,它在各种单细胞分析任务中取得了最先进的性能,例如基因表达增强、组织药物反应预测、单细胞药物反应分类、单细胞扰动预测、细胞类型注释和基因模块推断。

相似文献

1
Large-scale foundation model on single-cell transcriptomics.单细胞转录组学的大规模基础模型。
Nat Methods. 2024 Aug;21(8):1481-1491. doi: 10.1038/s41592-024-02305-7. Epub 2024 Jun 6.
2
scGPT: toward building a foundation model for single-cell multi-omics using generative AI.scGPT:迈向使用生成式人工智能构建单细胞多组学基础模型
Nat Methods. 2024 Aug;21(8):1470-1480. doi: 10.1038/s41592-024-02201-0. Epub 2024 Feb 26.
3
scBOL: a universal cell type identification framework for single-cell and spatial transcriptomics data.scBOL:单细胞和空间转录组学数据的通用细胞类型识别框架。
Brief Bioinform. 2024 Mar 27;25(3). doi: 10.1093/bib/bbae188.
4
scTab: Scaling cross-tissue single-cell annotation models.scTab:缩放跨组织单细胞注释模型。
Nat Commun. 2024 Aug 4;15(1):6611. doi: 10.1038/s41467-024-51059-5.
5
Continually adapting pre-trained language model to universal annotation of single-cell RNA-seq data.持续调整预先训练的语言模型,以实现单细胞 RNA-seq 数据的通用注释。
Brief Bioinform. 2024 Jan 22;25(2). doi: 10.1093/bib/bbae047.
6
Elucidating transcriptomic profiles from single-cell RNA sequencing data using nature-inspired compressed sensing.利用受自然启发的压缩感知技术从单细胞 RNA 测序数据中阐明转录组特征。
Brief Bioinform. 2021 Sep 2;22(5). doi: 10.1093/bib/bbab125.
7
scGen predicts single-cell perturbation responses.scGen 预测单细胞扰动反应。
Nat Methods. 2019 Aug;16(8):715-721. doi: 10.1038/s41592-019-0494-8. Epub 2019 Jul 29.
8
Dimension reduction, cell clustering, and cell-cell communication inference for single-cell transcriptomics with DcjComm.使用 DcjComm 进行单细胞转录组学的降维、细胞聚类和细胞间通讯推断。
Genome Biol. 2024 Sep 9;25(1):241. doi: 10.1186/s13059-024-03385-6.
9
Single-cell and spatial transcriptomics enables probabilistic inference of cell type topography.单细胞和空间转录组学能够对细胞类型拓扑进行概率推断。
Commun Biol. 2020 Oct 9;3(1):565. doi: 10.1038/s42003-020-01247-y.
10
scTransSort: Transformers for Intelligent Annotation of Cell Types by Gene Embeddings.scTransSort:基于基因嵌入的细胞类型智能注释的转换器。
Biomolecules. 2023 Mar 28;13(4):611. doi: 10.3390/biom13040611.

引用本文的文献

1
scELMo: Embeddings from Language Models are Good Learners for Single-cell Data Analysis.scELMo:来自语言模型的嵌入是单细胞数据分析的优秀学习者。
bioRxiv. 2025 Aug 23:2023.12.07.569910. doi: 10.1101/2023.12.07.569910.
2
AUPRC: a metric for evaluating the performance of in-silico perturbation methods in identifying differentially expressed genes.AUPRC:一种用于评估计算机模拟扰动方法在识别差异表达基因方面性能的指标。
Brief Bioinform. 2025 Aug 31;26(5). doi: 10.1093/bib/bbaf426.
3
Systema: a framework for evaluating genetic perturbation response prediction beyond systematic variation.

本文引用的文献

1
Predicting cellular responses to complex perturbations in high-throughput screens.高通量筛选中预测细胞对复杂扰动的反应。
Mol Syst Biol. 2023 Jun 12;19(6):e11517. doi: 10.15252/msb.202211517. Epub 2023 May 8.
2
Transformer for one stop interpretable cell type annotation.用于一站式可解释细胞类型注释的 Transformer。
Nat Commun. 2023 Jan 14;14(1):223. doi: 10.1038/s41467-023-35923-4.
3
Impact of the Human Cell Atlas on medicine.人类细胞图谱对医学的影响。
系统:一个用于评估超越系统变异的基因扰动反应预测的框架。
Nat Biotechnol. 2025 Aug 25. doi: 10.1038/s41587-025-02777-8.
4
BioLLM: A standardized framework for integrating and benchmarking single-cell foundation models.BioLLM:一个用于整合和基准测试单细胞基础模型的标准化框架。
Patterns (N Y). 2025 Jul 30;6(8):101326. doi: 10.1016/j.patter.2025.101326. eCollection 2025 Aug 8.
5
CART-GPT: A T Cell-Informed AI Linguistic Framework for Interpreting Neurotoxicity and Therapeutic Outcomes in CAR-T Therapy.CART-GPT:一种基于T细胞信息的人工智能语言框架,用于解读CAR-T疗法中的神经毒性和治疗结果。
bioRxiv. 2025 Aug 12:2025.08.08.669387. doi: 10.1101/2025.08.08.669387.
6
Small, open-source text-embedding models as substitutes to OpenAI models for gene analysis.小型开源文本嵌入模型可替代OpenAI模型用于基因分析。
Comput Struct Biotechnol J. 2025 Aug 6;27:3598-3608. doi: 10.1016/j.csbj.2025.07.053. eCollection 2025.
7
A message passing framework for precise cell state identification with scClassify2.一种用于通过scClassify2进行精确细胞状态识别的消息传递框架。
Genome Biol. 2025 Aug 19;26(1):252. doi: 10.1186/s13059-025-03722-3.
8
: a superfast and scalable single-cell RNA-seq data analysis pipeline powered by GPU.一个由GPU驱动的超快速且可扩展的单细胞RNA测序数据分析流程。
Bioinform Adv. 2025 Jul 17;5(1):vbaf167. doi: 10.1093/bioadv/vbaf167. eCollection 2025.
9
Deep-learning-based gene perturbation effect prediction does not yet outperform simple linear baselines.基于深度学习的基因扰动效应预测尚未超越简单的线性基线。
Nat Methods. 2025 Aug;22(8):1657-1661. doi: 10.1038/s41592-025-02772-6. Epub 2025 Aug 4.
10
Approaching the holistic transcriptome-convolution and deconvolution in transcriptomics.探索转录组学中的整体转录组卷积与反卷积
Brief Bioinform. 2025 Jul 2;26(4). doi: 10.1093/bib/bbaf388.
Nat Med. 2022 Dec;28(12):2486-2496. doi: 10.1038/s41591-022-02104-7. Epub 2022 Dec 8.
4
Genenames.org: the HGNC resources in 2023.Genenames.org:2023 年的 HGNC 资源。
Nucleic Acids Res. 2023 Jan 6;51(D1):D1003-D1009. doi: 10.1093/nar/gkac888.
5
hECA: The cell-centric assembly of a cell atlas.人类细胞图谱计划(hECA):以细胞为中心构建细胞图谱。
iScience. 2022 Apr 28;25(5):104318. doi: 10.1016/j.isci.2022.104318. eCollection 2022 May 20.
6
Cross-tissue immune cell analysis reveals tissue-specific features in humans.跨组织免疫细胞分析揭示人类组织特异性特征。
Science. 2022 May 13;376(6594):eabl5197. doi: 10.1126/science.abl5197.
7
Single-cell RNA sequencing technologies and applications: A brief overview.单细胞 RNA 测序技术及应用:简述。
Clin Transl Med. 2022 Mar;12(3):e694. doi: 10.1002/ctm2.694.
8
scPretrain: multi-task self-supervised learning for cell-type classification.scPretrain:用于细胞类型分类的多任务自监督学习
Bioinformatics. 2022 Mar 4;38(6):1607-1614. doi: 10.1093/bioinformatics/btac007.
9
Benchmarking atlas-level data integration in single-cell genomics.单细胞基因组学中图谱级数据整合的基准测试。
Nat Methods. 2022 Jan;19(1):41-50. doi: 10.1038/s41592-021-01336-8. Epub 2021 Dec 23.
10
DISCO: a database of Deeply Integrated human Single-Cell Omics data.DISCO:一个深度整合的人类单细胞组学数据数据库。
Nucleic Acids Res. 2022 Jan 7;50(D1):D596-D602. doi: 10.1093/nar/gkab1020.