Gao Yicheng, Dong Kejing, Shan Caihua, Li Dongsheng, Liu Qi
Shanghai Key Laboratory of Anesthesiology and Brain Functional Modulation, Clinical Research Center for Anesthesiology and Perioperative Medicine, Translational Research Institute of Brain and Brain-Like Intelligence, Shanghai Fourth People's Hospital, Frontier Science Center for Stem Cell Research, Bioinformatics Department, School of Life Sciences and Technology, Tongji University, Shanghai, China.
Department of Hematology, Tongji Hospital, Frontier Science Center for Stem Cell Research, Bioinformatics Department, School of Life Sciences and Technology, Tongji University, Shanghai, China.
Nat Commun. 2025 Jul 23;16(1):6775. doi: 10.1038/s41467-025-62008-1.
Conducting disentanglement learning on single-cell omics data offers a promising alternative to traditional black-box representation learning by separating the semantic concepts embedded in a biological process. We present CausCell, which incorporates the factual information about causal relationships among disentangled concepts within a diffusion model to generate more reliable disentangled cellular representations, with the aim of increasing the explainability, generalizability and controllability of single-cell data, including spatial-temporal omics data, relative to those of the existing black-box representation learning models. Two quantitative evaluation scenarios, i.e., disentanglement and reconstruction, are presented to conduct the first comprehensive single-cell disentanglement learning benchmark, which demonstrates that CausCell outperforms the state-of-the-art methods in both scenarios. Additionally, CausCell can implement controllable generation by intervening with the concepts of single-cell data when given a causal structure. It also has the potential to uncover biological insights by generating counterfactuals from small and noisy single-cell datasets.
对单细胞组学数据进行解缠学习,通过分离生物过程中嵌入的语义概念,为传统的黑箱表示学习提供了一种有前景的替代方法。我们提出了CausCell,它将解缠概念之间因果关系的事实信息整合到扩散模型中,以生成更可靠的解缠细胞表示,目的是相对于现有的黑箱表示学习模型,提高单细胞数据(包括时空组学数据)的可解释性、泛化性和可控性。提出了两种定量评估场景,即解缠和重建,以进行首个全面的单细胞解缠学习基准测试,结果表明CausCell在这两种场景下均优于现有最先进的方法。此外,当给定因果结构时,CausCell可以通过干预单细胞数据的概念来实现可控生成。它还有潜力通过从小的有噪声的单细胞数据集中生成反事实来揭示生物学见解。