• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

scDiffusion:使用扩散模型生成高质量单细胞数据的条件生成。

scDiffusion: conditional generation of high-quality single-cell data using diffusion model.

机构信息

MOE Key Lab of Bioinformatics and Bioinformatics Division of BNRIST, Department of Automation, Tsinghua University, Beijing 100084, China.

School of Life Sciences and School of Medicine, Center for Synthetic and Systems Biology, Tsinghua University, Beijing 100084, China.

出版信息

Bioinformatics. 2024 Sep 2;40(9). doi: 10.1093/bioinformatics/btae518.

DOI:10.1093/bioinformatics/btae518
PMID:39171840
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC11368386/
Abstract

MOTIVATION

Single-cell RNA sequencing (scRNA-seq) data are important for studying the laws of life at single-cell level. However, it is still challenging to obtain enough high-quality scRNA-seq data. To mitigate the limited availability of data, generative models have been proposed to computationally generate synthetic scRNA-seq data. Nevertheless, the data generated with current models are not very realistic yet, especially when we need to generate data with controlled conditions. In the meantime, diffusion models have shown their power in generating data with high fidelity, providing a new opportunity for scRNA-seq generation.

RESULTS

In this study, we developed scDiffusion, a generative model combining the diffusion model and foundation model to generate high-quality scRNA-seq data with controlled conditions. We designed multiple classifiers to guide the diffusion process simultaneously, enabling scDiffusion to generate data under multiple condition combinations. We also proposed a new control strategy called Gradient Interpolation. This strategy allows the model to generate continuous trajectories of cell development from a given cell state. Experiments showed that scDiffusion could generate single-cell gene expression data closely resembling real scRNA-seq data. Also, scDiffusion can conditionally produce data on specific cell types including rare cell types. Furthermore, we could use the multiple-condition generation of scDiffusion to generate cell type that was out of the training data. Leveraging the Gradient Interpolation strategy, we generated a continuous developmental trajectory of mouse embryonic cells. These experiments demonstrate that scDiffusion is a powerful tool for augmenting the real scRNA-seq data and can provide insights into cell fate research.

AVAILABILITY AND IMPLEMENTATION

scDiffusion is openly available at the GitHub repository https://github.com/EperLuo/scDiffusion or Zenodo https://zenodo.org/doi/10.5281/zenodo.13268742.

摘要

动机

单细胞 RNA 测序 (scRNA-seq) 数据对于研究单细胞水平的生命规律非常重要。然而,获得足够高质量的 scRNA-seq 数据仍然具有挑战性。为了缓解数据的有限可用性,已经提出了生成模型来计算生成合成 scRNA-seq 数据。然而,当前模型生成的数据还不是非常真实,特别是当我们需要生成具有受控条件的数据时。与此同时,扩散模型在生成高保真度的数据方面显示出了其强大的能力,为 scRNA-seq 的生成提供了新的机会。

结果

在本研究中,我们开发了 scDiffusion,这是一种结合扩散模型和基础模型的生成模型,可生成具有受控条件的高质量 scRNA-seq 数据。我们设计了多个分类器来同时指导扩散过程,使 scDiffusion 能够在多个条件组合下生成数据。我们还提出了一种新的控制策略,称为梯度插值。该策略允许模型从给定的细胞状态生成细胞发育的连续轨迹。实验表明,scDiffusion 可以生成与真实 scRNA-seq 数据非常相似的单细胞基因表达数据。此外,scDiffusion 可以有条件地生成特定细胞类型(包括稀有细胞类型)的数据。此外,我们可以使用 scDiffusion 的多条件生成来生成不在训练数据中的细胞类型。利用梯度插值策略,我们生成了小鼠胚胎细胞的连续发育轨迹。这些实验表明,scDiffusion 是增强真实 scRNA-seq 数据的有力工具,并可以为细胞命运研究提供深入的见解。

可用性和实现

scDiffusion 可在 GitHub 存储库 https://github.com/EperLuo/scDiffusion 或 Zenodo https://zenodo.org/doi/10.5281/zenodo.13268742 上公开获得。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5c7c/11368386/08bdf31b8bce/btae518f5.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5c7c/11368386/97dbc50ca7d2/btae518f1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5c7c/11368386/450649ed144f/btae518f2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5c7c/11368386/afd3f49b5562/btae518f3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5c7c/11368386/5e7ec53777fb/btae518f4.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5c7c/11368386/08bdf31b8bce/btae518f5.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5c7c/11368386/97dbc50ca7d2/btae518f1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5c7c/11368386/450649ed144f/btae518f2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5c7c/11368386/afd3f49b5562/btae518f3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5c7c/11368386/5e7ec53777fb/btae518f4.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5c7c/11368386/08bdf31b8bce/btae518f5.jpg

相似文献

1
scDiffusion: conditional generation of high-quality single-cell data using diffusion model.scDiffusion:使用扩散模型生成高质量单细胞数据的条件生成。
Bioinformatics. 2024 Sep 2;40(9). doi: 10.1093/bioinformatics/btae518.
2
A machine learning-based method for automatically identifying novel cells in annotating single-cell RNA-seq data.基于机器学习的方法,用于自动识别注释单细胞 RNA-seq 数据中的新型细胞。
Bioinformatics. 2022 Oct 31;38(21):4885-4892. doi: 10.1093/bioinformatics/btac617.
3
Scaling up single-cell RNA-seq data analysis with CellBridge workflow.单细胞 RNA-seq 数据分析的 CellBridge 工作流程扩展。
Bioinformatics. 2023 Dec 1;39(12). doi: 10.1093/bioinformatics/btad760.
4
CTISL: a dynamic stacking multi-class classification approach for identifying cell types from single-cell RNA-seq data.CTISL:一种动态堆叠多类分类方法,用于从单细胞 RNA-seq 数据中识别细胞类型。
Bioinformatics. 2024 Feb 1;40(2). doi: 10.1093/bioinformatics/btae063.
5
Learning deep features and topological structure of cells for clustering of scRNA-sequencing data.学习 scRNA-seq 数据聚类的细胞深度特征和拓扑结构。
Brief Bioinform. 2022 May 13;23(3). doi: 10.1093/bib/bbac068.
6
Systematic evaluation with practical guidelines for single-cell and spatially resolved transcriptomics data simulation under multiple scenarios.系统评估及多种场景下单细胞和空间分辨转录组数据模拟的实用指南。
Genome Biol. 2024 Jun 3;25(1):145. doi: 10.1186/s13059-024-03290-y.
7
scBGEDA: deep single-cell clustering analysis via a dual denoising autoencoder with bipartite graph ensemble clustering.scBGEDA:基于双分图集成分聚类的对偶去噪自动编码器的单细胞聚类分析。
Bioinformatics. 2023 Feb 14;39(2). doi: 10.1093/bioinformatics/btad075.
8
A multitask clustering approach for single-cell RNA-seq analysis in Recessive Dystrophic Epidermolysis Bullosa.一种用于隐性营养不良型大疱性表皮松解症的单细胞 RNA-seq 分析的多任务聚类方法。
PLoS Comput Biol. 2018 Apr 9;14(4):e1006053. doi: 10.1371/journal.pcbi.1006053. eCollection 2018 Apr.
9
scBoolSeq: Linking scRNA-seq statistics and Boolean dynamics.scBoolSeq:将 scRNA-seq 统计与布尔动力学联系起来。
PLoS Comput Biol. 2024 Jul 8;20(7):e1011620. doi: 10.1371/journal.pcbi.1011620. eCollection 2024 Jul.
10
Fast interpolation-based t-SNE for improved visualization of single-cell RNA-seq data.基于快速插值的 t-SNE 用于改善单细胞 RNA-seq 数据的可视化。
Nat Methods. 2019 Mar;16(3):243-245. doi: 10.1038/s41592-018-0308-4. Epub 2019 Feb 11.

引用本文的文献

1
Squidiff: Predicting cellular development and responses to perturbations using a diffusion model.Squidiff:使用扩散模型预测细胞发育及对扰动的反应
bioRxiv. 2025 Aug 26:2024.11.16.623974. doi: 10.1101/2024.11.16.623974.
2
iGTP: learning interpretable cellular embedding for inferring biological mechanisms underlying single-cell transcriptomics.iGTP:学习可解释的细胞嵌入以推断单细胞转录组学背后的生物学机制。
Brief Bioinform. 2025 May 1;26(3). doi: 10.1093/bib/bbaf296.
3
cfDiffusion: diffusion-based efficient generation of high quality scRNA-seq data with classifier-free guidance.

本文引用的文献

1
Large-scale foundation model on single-cell transcriptomics.单细胞转录组学的大规模基础模型。
Nat Methods. 2024 Aug;21(8):1481-1491. doi: 10.1038/s41592-024-02305-7. Epub 2024 Jun 6.
2
scGPT: toward building a foundation model for single-cell multi-omics using generative AI.scGPT:迈向使用生成式人工智能构建单细胞多组学基础模型
Nat Methods. 2024 Aug;21(8):1470-1480. doi: 10.1038/s41592-024-02201-0. Epub 2024 Feb 26.
3
Transfer learning enables predictions in network biology.迁移学习可实现网络生物学预测。
cfDiffusion:基于扩散的高质量单细胞RNA测序数据高效生成,无分类器引导。
Brief Bioinform. 2024 Nov 22;26(1). doi: 10.1093/bib/bbaf071.
4
Single-cell RNA-seq data augmentation using generative Fourier transformer.使用生成式傅里叶变换进行单细胞RNA测序数据增强
Commun Biol. 2025 Jan 22;8(1):113. doi: 10.1038/s42003-025-07552-8.
5
Cell2Sentence: Teaching Large Language Models the Language of Biology.细胞到句子:向大语言模型传授生物学语言。
bioRxiv. 2024 Oct 29:2023.09.11.557287. doi: 10.1101/2023.09.11.557287.
6
Linking transcriptome and morphology in bone cells at cellular resolution with generative AI.利用生成式人工智能在细胞分辨率下将骨细胞中的转录组与形态学联系起来。
J Bone Miner Res. 2024 Dec 31;40(1):20-26. doi: 10.1093/jbmr/zjae151.
Nature. 2023 Jun;618(7965):616-624. doi: 10.1038/s41586-023-06139-9. Epub 2023 May 31.
4
scDesign3 generates realistic in silico data for multimodal single-cell and spatial omics.scDesign3 生成用于多模态单细胞和空间基因组学的逼真的计算机模拟数据。
Nat Biotechnol. 2024 Feb;42(2):247-252. doi: 10.1038/s41587-023-01772-1. Epub 2023 May 11.
5
Diffusion Models in Vision: A Survey.视觉中的扩散模型:综述
IEEE Trans Pattern Anal Mach Intell. 2023 Sep;45(9):10850-10869. doi: 10.1109/TPAMI.2023.3261988. Epub 2023 Aug 7.
6
CellMarker 2.0: an updated database of manually curated cell markers in human/mouse and web tools based on scRNA-seq data.CellMarker 2.0:一个更新的数据库,包含基于 scRNA-seq 数据的人类/小鼠细胞标志物的人工注释和网络工具。
Nucleic Acids Res. 2023 Jan 6;51(D1):D870-D876. doi: 10.1093/nar/gkac947.
7
Single cell RNA-sequencing: A powerful yet still challenging technology to study cellular heterogeneity.单细胞 RNA 测序:一种强大但仍具挑战性的技术,可用于研究细胞异质性。
Bioessays. 2022 Nov;44(11):e2200084. doi: 10.1002/bies.202200084. Epub 2022 Sep 6.
8
Big data in basic and translational cancer research.基础和转化癌症研究中的大数据。
Nat Rev Cancer. 2022 Nov;22(11):625-639. doi: 10.1038/s41568-022-00502-0. Epub 2022 Sep 5.
9
LSH-GAN enables in-silico generation of cells for small sample high dimensional scRNA-seq data.LSH-GAN 可实现小样本高维 scRNA-seq 数据的计算机细胞生成。
Commun Biol. 2022 Jun 10;5(1):577. doi: 10.1038/s42003-022-03473-y.
10
Cross-tissue immune cell analysis reveals tissue-specific features in humans.跨组织免疫细胞分析揭示人类组织特异性特征。
Science. 2022 May 13;376(6594):eabl5197. doi: 10.1126/science.abl5197.