• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

从单细胞转录组数据中学习可解释的细胞和基因特征嵌入。

Learning interpretable cellular and gene signature embeddings from single-cell transcriptomic data.

机构信息

School of Computer Science, McGill University, Montreal, QC, Canada.

Harvard-MIT Health Sciences and Technology, Cambridge, MA, USA.

出版信息

Nat Commun. 2021 Sep 6;12(1):5261. doi: 10.1038/s41467-021-25534-2.

DOI:10.1038/s41467-021-25534-2
PMID:34489404
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC8421403/
Abstract

The advent of single-cell RNA sequencing (scRNA-seq) technologies has revolutionized transcriptomic studies. However, large-scale integrative analysis of scRNA-seq data remains a challenge largely due to unwanted batch effects and the limited transferabilty, interpretability, and scalability of the existing computational methods. We present single-cell Embedded Topic Model (scETM). Our key contribution is the utilization of a transferable neural-network-based encoder while having an interpretable linear decoder via a matrix tri-factorization. In particular, scETM simultaneously learns an encoder network to infer cell type mixture and a set of highly interpretable gene embeddings, topic embeddings, and batch-effect linear intercepts from multiple scRNA-seq datasets. scETM is scalable to over 10 cells and confers remarkable cross-tissue and cross-species zero-shot transfer-learning performance. Using gene set enrichment analysis, we find that scETM-learned topics are enriched in biologically meaningful and disease-related pathways. Lastly, scETM enables the incorporation of known gene sets into the gene embeddings, thereby directly learning the associations between pathways and topics via the topic embeddings.

摘要

单细胞 RNA 测序 (scRNA-seq) 技术的出现彻底改变了转录组学研究。然而,由于不需要的批次效应以及现有计算方法的有限可转移性、可解释性和可扩展性,大规模整合 scRNA-seq 数据仍然是一个挑战。我们提出了单细胞嵌入式主题模型 (scETM)。我们的主要贡献是利用可转移的基于神经网络的编码器,同时通过矩阵三因子分解实现可解释的线性解码器。具体来说,scETM 同时学习一个编码器网络来推断细胞类型混合物,以及一组高度可解释的基因嵌入、主题嵌入和来自多个 scRNA-seq 数据集的批次效应线性截距。scETM 可扩展到超过 10 个细胞,并具有显著的跨组织和跨物种零样本迁移学习性能。通过基因集富集分析,我们发现 scETM 学习的主题在生物学上有意义和与疾病相关的途径中得到了富集。最后,scETM 能够将已知的基因集纳入基因嵌入中,从而通过主题嵌入直接学习途径和主题之间的关联。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5818/8421403/8fda8898031a/41467_2021_25534_Fig7_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5818/8421403/72b41ba8ec18/41467_2021_25534_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5818/8421403/aaa0df4a394e/41467_2021_25534_Fig2_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5818/8421403/497f0f57c00a/41467_2021_25534_Fig3_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5818/8421403/bb90f77f9a4b/41467_2021_25534_Fig4_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5818/8421403/557be6ec87b9/41467_2021_25534_Fig5_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5818/8421403/1850b4f9cf27/41467_2021_25534_Fig6_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5818/8421403/8fda8898031a/41467_2021_25534_Fig7_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5818/8421403/72b41ba8ec18/41467_2021_25534_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5818/8421403/aaa0df4a394e/41467_2021_25534_Fig2_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5818/8421403/497f0f57c00a/41467_2021_25534_Fig3_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5818/8421403/bb90f77f9a4b/41467_2021_25534_Fig4_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5818/8421403/557be6ec87b9/41467_2021_25534_Fig5_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5818/8421403/1850b4f9cf27/41467_2021_25534_Fig6_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5818/8421403/8fda8898031a/41467_2021_25534_Fig7_HTML.jpg

相似文献

1
Learning interpretable cellular and gene signature embeddings from single-cell transcriptomic data.从单细胞转录组数据中学习可解释的细胞和基因特征嵌入。
Nat Commun. 2021 Sep 6;12(1):5261. doi: 10.1038/s41467-021-25534-2.
2
DeepGSEA: explainable deep gene set enrichment analysis for single-cell transcriptomic data.DeepGSEA:单细胞转录组数据的可解释深度基因集富集分析。
Bioinformatics. 2024 Jul 1;40(7). doi: 10.1093/bioinformatics/btae434.
3
Cross-species cell-type assignment from single-cell RNA-seq data by a heterogeneous graph neural network.基于异质图神经网络的单细胞 RNA-seq 数据的跨物种细胞类型分配。
Genome Res. 2023 Jan;33(1):96-111. doi: 10.1101/gr.276868.122. Epub 2022 Dec 16.
4
Learning deep features and topological structure of cells for clustering of scRNA-sequencing data.学习 scRNA-seq 数据聚类的细胞深度特征和拓扑结构。
Brief Bioinform. 2022 May 13;23(3). doi: 10.1093/bib/bbac068.
5
Decomposing Cell Identity for Transfer Learning across Cellular Measurements, Platforms, Tissues, and Species.跨细胞测量、平台、组织和物种进行迁移学习的细胞身份分解。
Cell Syst. 2019 May 22;8(5):395-411.e8. doi: 10.1016/j.cels.2019.04.004.
6
Deconvolution of autoencoders to learn biological regulatory modules from single cell mRNA sequencing data.从单细胞 mRNA 测序数据中反卷积自动编码器以学习生物调节模块。
BMC Bioinformatics. 2019 Jul 8;20(1):379. doi: 10.1186/s12859-019-2952-9.
7
Accurate and interpretable gene expression imputation on scRNA-seq data using IGSimpute.使用 IGSimpute 实现 scRNA-seq 数据的准确和可解释的基因表达推断。
Brief Bioinform. 2023 May 19;24(3). doi: 10.1093/bib/bbad124.
8
Single-Cell Capture, RNA-seq, and Transcriptome Analysis from the Neural Retina.来自神经视网膜的单细胞捕获、RNA测序及转录组分析。
Methods Mol Biol. 2020;2092:159-186. doi: 10.1007/978-1-0716-0175-4_12.
9
Latent cellular analysis robustly reveals subtle diversity in large-scale single-cell RNA-seq data.潜伏细胞分析能稳健地揭示大规模单细胞 RNA-seq 数据中的细微多样性。
Nucleic Acids Res. 2019 Dec 16;47(22):e143. doi: 10.1093/nar/gkz826.
10
Single-Cell RNA Sequencing Analysis: A Step-by-Step Overview.单细胞 RNA 测序分析:分步概述。
Methods Mol Biol. 2021;2284:343-365. doi: 10.1007/978-1-0716-1307-8_19.

引用本文的文献

1
A message passing framework for precise cell state identification with scClassify2.一种用于通过scClassify2进行精确细胞状态识别的消息传递框架。
Genome Biol. 2025 Aug 19;26(1):252. doi: 10.1186/s13059-025-03722-3.
2
iGTP: learning interpretable cellular embedding for inferring biological mechanisms underlying single-cell transcriptomics.iGTP:学习可解释的细胞嵌入以推断单细胞转录组学背后的生物学机制。
Brief Bioinform. 2025 May 1;26(3). doi: 10.1093/bib/bbaf296.
3
Artificial intelligence approaches for tumor phenotype stratification from single-cell transcriptomic data.

本文引用的文献

1
Celda: a Bayesian model to perform co-clustering of genes into modules and cells into subpopulations using single-cell RNA-seq data.Celda:一种贝叶斯模型,用于使用单细胞RNA测序数据将基因共聚类成模块,并将细胞共聚类成亚群。
NAR Genom Bioinform. 2022 Sep 13;4(3):lqac066. doi: 10.1093/nargab/lqac066. eCollection 2022 Sep.
2
A taxonomy of transcriptomic cell types across the isocortex and hippocampal formation.跨岛叶和海马结构的转录组细胞类型分类学。
Cell. 2021 Jun 10;184(12):3222-3241.e26. doi: 10.1016/j.cell.2021.04.021. Epub 2021 May 17.
3
Joint probabilistic modeling of single-cell multi-omic data with totalVI.
基于单细胞转录组数据的肿瘤表型分层的人工智能方法
Elife. 2025 Jun 13;13:RP98469. doi: 10.7554/eLife.98469.
4
BuDDI: Bulk Deconvolution with Domain Invariance to predict cell-type-specific perturbations from bulk.BuDDI:具有域不变性的批量反卷积,用于从批量数据中预测细胞类型特异性扰动。
PLoS Comput Biol. 2025 Jan 17;21(1):e1012742. doi: 10.1371/journal.pcbi.1012742. eCollection 2025 Jan.
5
scGO: interpretable deep neural network for cell status annotation and disease diagnosis.scGO:用于细胞状态注释和疾病诊断的可解释深度神经网络。
Brief Bioinform. 2024 Nov 22;26(1). doi: 10.1093/bib/bbaf018.
6
DeepQA: A Unified Transcriptome-Based Aging Clock Using Deep Neural Networks.深度问答:一种使用深度神经网络的基于转录组的统一衰老时钟。
Aging Cell. 2025 May;24(5):e14471. doi: 10.1111/acel.14471. Epub 2025 Jan 5.
7
scGraph2Vec: a deep generative model for gene embedding augmented by graph neural network and single-cell omics data.scGraph2Vec:一种由图神经网络和单细胞组学数据增强的用于基因嵌入的深度生成模型。
Gigascience. 2024 Jan 2;13. doi: 10.1093/gigascience/giae108.
8
Designing interpretable deep learning applications for functional genomics: a quantitative analysis.设计可解释的深度学习应用于功能基因组学:一项定量分析。
Brief Bioinform. 2024 Jul 25;25(5). doi: 10.1093/bib/bbae449.
9
A best-match approach for gene set analyses in embedding spaces.一种在嵌入空间中进行基因集分析的最佳匹配方法。
Genome Res. 2024 Oct 11;34(9):1421-1433. doi: 10.1101/gr.279141.124.
10
Inferring pattern-driving intercellular flows from single-cell and spatial transcriptomics.从单细胞和空间转录组学推断模式驱动的细胞间流。
Nat Methods. 2024 Oct;21(10):1806-1817. doi: 10.1038/s41592-024-02380-w. Epub 2024 Aug 26.
单细胞多组学数据的总变分联合概率建模。
Nat Methods. 2021 Mar;18(3):272-282. doi: 10.1038/s41592-020-01050-x. Epub 2021 Feb 15.
4
Update on GPCR-based targets for the development of novel antidepressants.新型抗抑郁药基于 G 蛋白偶联受体靶点的研究进展。
Mol Psychiatry. 2022 Jan;27(1):534-558. doi: 10.1038/s41380-021-01040-1. Epub 2021 Feb 15.
5
The Gene Ontology resource: enriching a GOld mine.基因本体论资源:丰富一个 GOld 矿。
Nucleic Acids Res. 2021 Jan 8;49(D1):D325-D334. doi: 10.1093/nar/gkaa1113.
6
Mouse Genome Database (MGD): Knowledgebase for mouse-human comparative biology.鼠类基因组数据库(MGD):小鼠与人类比较生物学知识库。
Nucleic Acids Res. 2021 Jan 8;49(D1):D981-D987. doi: 10.1093/nar/gkaa1083.
7
Deep feature extraction of single-cell transcriptomes by generative adversarial network.基于生成对抗网络的单细胞转录组深度特征提取。
Bioinformatics. 2021 Jun 16;37(10):1345-1351. doi: 10.1093/bioinformatics/btaa976.
8
MARS: discovering novel cell types across heterogeneous single-cell experiments.MARS:在异质单细胞实验中发现新型细胞类型。
Nat Methods. 2020 Dec;17(12):1200-1206. doi: 10.1038/s41592-020-00979-3. Epub 2020 Oct 19.
9
Cumulus provides cloud-based data analysis for large-scale single-cell and single-nucleus RNA-seq.Cumulus 为大规模单细胞和单细胞核 RNA-seq 提供基于云的数据分析。
Nat Methods. 2020 Aug;17(8):793-798. doi: 10.1038/s41592-020-0905-x. Epub 2020 Jul 27.
10
Inferring multimodal latent topics from electronic health records.从电子健康记录中推断多模态潜在主题。
Nat Commun. 2020 May 21;11(1):2536. doi: 10.1038/s41467-020-16378-3.