MOE Key Laboratory of Bioinformatics and Bioinformatics Division, BNRIST, Department of Automation, Tsinghua University, Beijing, China.
School of Life Sciences and School of Medicine, Center for Synthetic and Systems Biology, Tsinghua University, Beijing, China.
Commun Biol. 2024 Aug 12;7(1):977. doi: 10.1038/s42003-024-06564-0.
A universal coordinate system that can ensemble the huge number of cells and capture their heterogeneities is of vital importance for constructing large-scale cell atlases as references for molecular and cellular studies. Studies have shown that cells exhibit multifaceted heterogeneities in their transcriptomic features at multiple resolutions. This nature of complexity makes it hard to design a fixed coordinate system through a combination of known features. It is desirable to build a learnable universal coordinate model that can capture major heterogeneities and serve as a controlled generative model for data augmentation. We developed UniCoord, a specially-tuned joint-VAE model to represent single-cell transcriptomic data in a lower-dimensional latent space with high interpretability. Each latent dimension can represent either discrete or continuous feature, and either supervised by prior knowledge or unsupervised. The latent dimensions can be easily reconfigured to generate pseudo transcriptomic profiles with desired properties. UniCoord can also be used as a pre-trained model to analyze new data with unseen cell types and thus can serve as a feasible framework for cell annotation and comparison. UniCoord provides a prototype for a learnable universal coordinate framework to enable better analysis and generation of cells with highly orchestrated functions and heterogeneities.
建立一个能够整合大量细胞并捕捉其异质性的通用坐标系,对于构建大规模细胞图谱作为分子和细胞研究的参考至关重要。研究表明,细胞在转录组特征的多个分辨率上表现出多方面的异质性。这种复杂性使得通过已知特征的组合来设计固定的坐标系统变得困难。因此,建立一个可学习的通用坐标模型来捕捉主要异质性,并作为数据增强的受控生成模型是很有必要的。我们开发了 UniCoord,这是一个专门调整的联合变分自编码器模型,可以在一个具有高可解释性的低维潜在空间中表示单细胞转录组数据。每个潜在维度都可以表示离散或连续的特征,并且可以由先验知识或无监督学习来监督。潜在维度可以轻松重新配置,以生成具有所需特性的伪转录组谱。UniCoord 还可以用作预训练模型来分析具有未见细胞类型的新数据,因此可以作为细胞注释和比较的可行框架。UniCoord 为可学习的通用坐标框架提供了一个原型,以实现对具有高度协调功能和异质性的细胞的更好分析和生成。