Suppr超能文献

基于无上下文零样本深度集成的算法从单细胞转录组数据推断 2500 多种表面蛋白的丰度。

Imputing abundance of over 2,500 surface proteins from single-cell transcriptomes with context-agnostic zero-shot deep ensembles.

机构信息

Department of Pharmacology and Toxicology, Michigan State University, East Lansing, MI 48824, USA.

Department of Computer Science and Engineering, Michigan State University, East Lansing, MI 48824, USA.

出版信息

Cell Syst. 2024 Sep 18;15(9):869-884.e6. doi: 10.1016/j.cels.2024.08.006. Epub 2024 Sep 6.

Abstract

Cell surface proteins serve as primary drug targets and cell identity markers. Techniques such as CITE-seq (cellular indexing of transcriptomes and epitopes by sequencing) have enabled the simultaneous quantification of surface protein abundance and transcript expression within individual cells. The published data have been utilized to train machine learning models for predicting surface protein abundance solely from transcript expression. However, the small scale of proteins predicted and the poor generalization ability of these computational approaches across diverse contexts (e.g., different tissues/disease states) impede their widespread adoption. Here, we propose SPIDER (surface protein prediction using deep ensembles from single-cell RNA sequencing), a context-agnostic zero-shot deep ensemble model, which enables large-scale protein abundance prediction and generalizes better to various contexts. Comprehensive benchmarking shows that SPIDER outperforms other state-of-the-art methods. Using the predicted surface abundance of >2,500 proteins from single-cell transcriptomes, we demonstrate the broad applications of SPIDER, including cell type annotation, biomarker/target identification, and cell-cell interaction analysis in hepatocellular carcinoma and colorectal cancer. A record of this paper's transparent peer review process is included in the supplemental information.

摘要

细胞表面蛋白是主要的药物靶点和细胞特征标志物。CITE-seq(通过测序对转录组和表位进行细胞索引)等技术能够在单个细胞内同时定量测量表面蛋白丰度和转录表达。已发表的数据被用于训练机器学习模型,仅根据转录表达预测表面蛋白丰度。然而,这些计算方法预测的蛋白数量较少,在不同的环境(例如不同的组织/疾病状态)中的泛化能力较差,限制了它们的广泛应用。在这里,我们提出了 SPIDER(使用单细胞 RNA 测序的深度集成进行表面蛋白预测),这是一种与上下文无关的零样本深度集成模型,能够实现大规模的蛋白丰度预测,并更好地推广到各种环境。全面的基准测试表明,SPIDER 优于其他最先进的方法。我们使用来自单细胞转录组的 >2500 种蛋白质的预测表面丰度,展示了 SPIDER 的广泛应用,包括肝细胞癌和结直肠癌中的细胞类型注释、生物标志物/靶点识别以及细胞间相互作用分析。本论文的透明同行评审过程记录包含在补充信息中。

相似文献

1
Imputing abundance of over 2,500 surface proteins from single-cell transcriptomes with context-agnostic zero-shot deep ensembles.
Cell Syst. 2024 Sep 18;15(9):869-884.e6. doi: 10.1016/j.cels.2024.08.006. Epub 2024 Sep 6.
4
Ensemble learning models that predict surface protein abundance from single-cell multimodal omics data.
Methods. 2021 May;189:65-73. doi: 10.1016/j.ymeth.2020.10.001. Epub 2020 Oct 9.
5
Surface protein imputation from single cell transcriptomes by deep neural networks.
Nat Commun. 2020 Jan 31;11(1):651. doi: 10.1038/s41467-020-14391-0.
6
scDM: A deep generative method for cell surface protein prediction with diffusion model.
J Mol Biol. 2024 Jun 15;436(12):168610. doi: 10.1016/j.jmb.2024.168610. Epub 2024 May 15.
9
A hybrid deep clustering approach for robust cell type profiling using single-cell RNA-seq data.
RNA. 2020 Oct;26(10):1303-1319. doi: 10.1261/rna.074427.119. Epub 2020 Jun 12.
10
Simultaneous Measurement of Surface Proteins and Gene Expression from Single Cells.
Methods Mol Biol. 2020;2111:35-46. doi: 10.1007/978-1-0716-0266-9_3.

引用本文的文献

1
DGAT: A Dual-Graph Attention Network for Inferring Spatial Protein Landscapes from Transcriptomics.
bioRxiv. 2025 Jul 9:2025.07.05.662121. doi: 10.1101/2025.07.05.662121.

本文引用的文献

1
Efficient Generation of Paired Single-Cell Multiomics Profiles by Deep Learning.
Adv Sci (Weinh). 2023 Jul;10(21):e2301169. doi: 10.1002/advs.202301169. Epub 2023 Apr 28.
3
Single-cell proteomics enabled by next-generation sequencing or mass spectrometry.
Nat Methods. 2023 Mar;20(3):363-374. doi: 10.1038/s41592-023-01791-5. Epub 2023 Mar 2.
4
The complex network of transcription factors, immune checkpoint inhibitors and stemness features in colorectal cancer: A recent update.
Semin Cancer Biol. 2023 Feb;89:1-17. doi: 10.1016/j.semcancer.2023.01.001. Epub 2023 Jan 6.
5
6
A Python library for probabilistic analysis of single-cell omics data.
Nat Biotechnol. 2022 Feb;40(2):163-166. doi: 10.1038/s41587-021-01206-w.
9
Single-cell sequencing unveils distinct immune microenvironments with CCR6-CCL20 crosstalk in human chronic pancreatitis.
Gut. 2022 Sep;71(9):1831-1842. doi: 10.1136/gutjnl-2021-324546. Epub 2021 Oct 26.
10
CD177 modulates the function and homeostasis of tumor-infiltrating regulatory T cells.
Nat Commun. 2021 Oct 1;12(1):5764. doi: 10.1038/s41467-021-26091-4.

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验