Cui Xuejian, Chen Xiaoyang, Li Zhen, Gao Zijing, Chen Shengquan, Jiang Rui
Ministry of Education Key Laboratory of Bioinformatics, Bioinformatics Division at the Beijing National Research Center for Information Science and Technology, Center for Synthetic and Systems Biology, Department of Automation, Tsinghua University, Beijing, China.
School of Mathematical Sciences and LPMC, Nankai University, Tianjin, China.
Nat Comput Sci. 2024 May;4(5):346-359. doi: 10.1038/s43588-024-00625-4. Epub 2024 May 10.
Single-cell epigenomic data has been growing continuously at an unprecedented pace, but their characteristics such as high dimensionality and sparsity pose substantial challenges to downstream analysis. Although deep learning models-especially variational autoencoders-have been widely used to capture low-dimensional feature embeddings, the prevalent Gaussian assumption somewhat disagrees with real data, and these models tend to struggle to incorporate reference information from abundant cell atlases. Here we propose CASTLE, a deep generative model based on the vector-quantized variational autoencoder framework to extract discrete latent embeddings that interpretably characterize single-cell chromatin accessibility sequencing data. We validate the performance and robustness of CASTLE for accurate cell-type identification and reasonable visualization compared with state-of-the-art methods. We demonstrate the advantages of CASTLE for effective incorporation of existing massive reference datasets in a weakly supervised or supervised manner. We further demonstrate CASTLE's capacity for intuitively distilling cell-type-specific feature spectra that unveil cell heterogeneity and biological implications quantitatively.
单细胞表观基因组数据一直在以前所未有的速度持续增长,但其诸如高维度和稀疏性等特征给下游分析带来了巨大挑战。尽管深度学习模型——尤其是变分自编码器——已被广泛用于捕获低维特征嵌入,但普遍的高斯假设与真实数据存在一定差异,并且这些模型往往难以整合来自丰富细胞图谱的参考信息。在此,我们提出了CASTLE,这是一种基于向量量化变分自编码器框架的深度生成模型,用于提取离散的潜在嵌入,从而以可解释的方式表征单细胞染色质可及性测序数据。与现有最先进的方法相比,我们验证了CASTLE在准确的细胞类型识别和合理的可视化方面的性能和稳健性。我们展示了CASTLE以弱监督或监督方式有效整合现有大量参考数据集的优势。我们进一步展示了CASTLE直观提炼细胞类型特异性特征谱的能力,这些特征谱定量地揭示了细胞异质性和生物学意义。