Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, NY, USA.
Department of Pathology, Molecular and Cell-Based Medicine, Icahn School of Medicine at Mount Sinai, New York, NY, USA.
Nat Commun. 2023 Mar 11;14(1):1350. doi: 10.1038/s41467-023-36961-8.
We introduce UniCell: Deconvolve Base (UCDBase), a pre-trained, interpretable, deep learning model to deconvolve cell type fractions and predict cell identity across Spatial, bulk-RNA-Seq, and scRNA-Seq datasets without contextualized reference data. UCD is trained on 10 million pseudo-mixtures from a fully-integrated scRNA-Seq training database comprising over 28 million annotated single cells spanning 840 unique cell types from 898 studies. We show that our UCDBase and transfer-learning models achieve comparable or superior performance on in-silico mixture deconvolution to existing, reference-based, state-of-the-art methods. Feature attribute analysis uncovers gene signatures associated with cell-type specific inflammatory-fibrotic responses in ischemic kidney injury, discerns cancer subtypes, and accurately deconvolves tumor microenvironments. UCD identifies pathologic changes in cell fractions among bulk-RNA-Seq data for several disease states. Applied to lung cancer scRNA-Seq data, UCD annotates and distinguishes normal from cancerous cells. Overall, UCD enhances transcriptomic data analysis, aiding in assessment of cellular and spatial context.
我们介绍了 UniCell:去卷积基(UCDBase),这是一个经过预训练的、可解释的深度学习模型,用于在没有上下文参考数据的情况下,对空间、批量 RNA-Seq 和 scRNA-Seq 数据集进行细胞类型分数的去卷积和细胞身份预测。UCD 是在一个完全集成的 scRNA-Seq 训练数据库中的 1000 万个伪混合物上进行训练的,该数据库包含超过 2800 万个注释的单细胞,跨越 898 项研究中的 840 个独特的细胞类型。我们表明,我们的 UCDBase 和迁移学习模型在模拟混合物去卷积方面的性能可与现有的、基于参考的、最先进的方法相媲美或优于它们。特征属性分析揭示了与缺血性肾损伤中细胞类型特异性炎症纤维化反应相关的基因特征,区分了癌症亚型,并准确地去卷积了肿瘤微环境。UCD 可以在多个疾病状态的批量 RNA-Seq 数据中识别细胞分数的病理变化。应用于肺癌 scRNA-Seq 数据,UCD 对正常细胞和癌细胞进行注释和区分。总的来说,UCD 增强了转录组数据分析,有助于评估细胞和空间背景。