Suppr超能文献

cfDiffusion:基于扩散的高质量单细胞RNA测序数据高效生成,无分类器引导。

cfDiffusion: diffusion-based efficient generation of high quality scRNA-seq data with classifier-free guidance.

作者信息

Zhang Tianjiao, Zhao Zhongqian, Ren Jixiang, Zhang Ziheng, Zhang Hongfei, Wang Guohua

机构信息

College of Computer and Control Engineering, Northeast Forestry University, No. 26, Hexing Road, Xiangfang District, Harbin 150040, China.

Faculty of Computing, Harbin Institute of Technology, No. 92 Xidazhi Street, Nangang District, Harbin 150001, China.

出版信息

Brief Bioinform. 2024 Nov 22;26(1). doi: 10.1093/bib/bbaf071.

Abstract

Single-cell RNA sequencing (scRNA-seq) technology provides a powerful means to measure gene expression at the individual cell level, thereby uncovering the intricate cellular heterogeneity that underlies various biological processes, including embryonic development, tumor metastasis, and microbial reproduction. However, the variable amounts of data generated across different cell types within tissues can compromise the accuracy of downstream analyses. Traditional approaches for generating scRNA-seq simulation data often rely on predefined data distributions, which can negatively impact the quality of the simulated data. Furthermore, these methods typically focus on simulating single-attribute cells, necessitating substantial additional data for the simulation of multi-attribute cells, which can lead to increased training times. To address these limitations, we propose cfDiffusion, a novel method grounded in diffusion models that incorporates Classifier-Free Guidance and a high-level feature caching mechanism. By leveraging Classifier-Free Guidance, cfDiffusion significantly reduces the training costs associated with model development compared to traditional Classifier Guidance methods. The integration of a caching mechanism further enhances efficiency by shortening inference times. While the inference duration of cfDiffusion remains longer than that of scDiffusion, it exhibits superior expressiveness and efficiency in generating multi-attribute single-cell data. Evaluated across datasets from multiple sequencing platforms, cfDiffusion consistently outperforms state-of-the-art models across various performance metrics. Additionally, cfDiffusion enables the simulation of single-cell data along a pseudo-time scale, facilitating advanced analyses such as tracking cell differentiation, investigating intercellular communication, and elucidating cellular heterogeneity.

摘要

单细胞RNA测序(scRNA-seq)技术提供了一种在单个细胞水平上测量基因表达的强大方法,从而揭示了各种生物过程(包括胚胎发育、肿瘤转移和微生物繁殖)背后复杂的细胞异质性。然而,组织内不同细胞类型产生的数据量不同,可能会影响下游分析的准确性。传统的生成scRNA-seq模拟数据的方法通常依赖于预定义的数据分布,这可能会对模拟数据的质量产生负面影响。此外,这些方法通常侧重于模拟单属性细胞,对于多属性细胞的模拟需要大量额外的数据,这可能会导致训练时间增加。为了解决这些局限性,我们提出了cfDiffusion,这是一种基于扩散模型的新方法,它结合了无分类器引导和高级特征缓存机制。通过利用无分类器引导,与传统的分类器引导方法相比,cfDiffusion显著降低了与模型开发相关的训练成本。缓存机制的集成通过缩短推理时间进一步提高了效率。虽然cfDiffusion的推理持续时间仍然比scDiffusion长,但它在生成多属性单细胞数据方面表现出卓越的表现力和效率。在来自多个测序平台的数据集上进行评估时,cfDiffusion在各种性能指标上始终优于现有模型。此外,cfDiffusion能够沿着伪时间尺度模拟单细胞数据,便于进行高级分析,如跟踪细胞分化、研究细胞间通讯和阐明细胞异质性。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1e43/11846686/2ea76599459a/bbaf071f1.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验