从单细胞转录组数据中学习可解释的细胞和基因特征嵌入。

Learning interpretable cellular and gene signature embeddings from single-cell transcriptomic data.

机构信息

School of Computer Science, McGill University, Montreal, QC, Canada.

Harvard-MIT Health Sciences and Technology, Cambridge, MA, USA.

出版信息

Nat Commun. 2021 Sep 6;12(1):5261. doi: 10.1038/s41467-021-25534-2.

DOI:10.1038/s41467-021-25534-2

PMID:34489404

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC8421403/

Abstract

The advent of single-cell RNA sequencing (scRNA-seq) technologies has revolutionized transcriptomic studies. However, large-scale integrative analysis of scRNA-seq data remains a challenge largely due to unwanted batch effects and the limited transferabilty, interpretability, and scalability of the existing computational methods. We present single-cell Embedded Topic Model (scETM). Our key contribution is the utilization of a transferable neural-network-based encoder while having an interpretable linear decoder via a matrix tri-factorization. In particular, scETM simultaneously learns an encoder network to infer cell type mixture and a set of highly interpretable gene embeddings, topic embeddings, and batch-effect linear intercepts from multiple scRNA-seq datasets. scETM is scalable to over 10 cells and confers remarkable cross-tissue and cross-species zero-shot transfer-learning performance. Using gene set enrichment analysis, we find that scETM-learned topics are enriched in biologically meaningful and disease-related pathways. Lastly, scETM enables the incorporation of known gene sets into the gene embeddings, thereby directly learning the associations between pathways and topics via the topic embeddings.

摘要

单细胞 RNA 测序 (scRNA-seq) 技术的出现彻底改变了转录组学研究。然而，由于不需要的批次效应以及现有计算方法的有限可转移性、可解释性和可扩展性，大规模整合 scRNA-seq 数据仍然是一个挑战。我们提出了单细胞嵌入式主题模型 (scETM)。我们的主要贡献是利用可转移的基于神经网络的编码器，同时通过矩阵三因子分解实现可解释的线性解码器。具体来说，scETM 同时学习一个编码器网络来推断细胞类型混合物，以及一组高度可解释的基因嵌入、主题嵌入和来自多个 scRNA-seq 数据集的批次效应线性截距。scETM 可扩展到超过 10 个细胞，并具有显著的跨组织和跨物种零样本迁移学习性能。通过基因集富集分析，我们发现 scETM 学习的主题在生物学上有意义和与疾病相关的途径中得到了富集。最后，scETM 能够将已知的基因集纳入基因嵌入中，从而通过主题嵌入直接学习途径和主题之间的关联。

Suppr 超能文献

文献检索

文件翻译

深度研究

Suppr 超能文献

文献检索

文件翻译

深度研究

从单细胞转录组数据中学习可解释的细胞和基因特征嵌入。

Learning interpretable cellular and gene signature embeddings from single-cell transcriptomic data.

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献

从单细胞转录组数据中学习可解释的细胞和基因特征嵌入。

Learning interpretable cellular and gene signature embeddings from single-cell transcriptomic data.

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献