Department of Pharmacology and Toxicology, Michigan State University, East Lansing, MI 48824, USA.
Department of Computer Science and Engineering, Michigan State University, East Lansing, MI 48824, USA.
Cell Syst. 2024 Sep 18;15(9):869-884.e6. doi: 10.1016/j.cels.2024.08.006. Epub 2024 Sep 6.
Cell surface proteins serve as primary drug targets and cell identity markers. Techniques such as CITE-seq (cellular indexing of transcriptomes and epitopes by sequencing) have enabled the simultaneous quantification of surface protein abundance and transcript expression within individual cells. The published data have been utilized to train machine learning models for predicting surface protein abundance solely from transcript expression. However, the small scale of proteins predicted and the poor generalization ability of these computational approaches across diverse contexts (e.g., different tissues/disease states) impede their widespread adoption. Here, we propose SPIDER (surface protein prediction using deep ensembles from single-cell RNA sequencing), a context-agnostic zero-shot deep ensemble model, which enables large-scale protein abundance prediction and generalizes better to various contexts. Comprehensive benchmarking shows that SPIDER outperforms other state-of-the-art methods. Using the predicted surface abundance of >2,500 proteins from single-cell transcriptomes, we demonstrate the broad applications of SPIDER, including cell type annotation, biomarker/target identification, and cell-cell interaction analysis in hepatocellular carcinoma and colorectal cancer. A record of this paper's transparent peer review process is included in the supplemental information.
细胞表面蛋白是主要的药物靶点和细胞特征标志物。CITE-seq(通过测序对转录组和表位进行细胞索引)等技术能够在单个细胞内同时定量测量表面蛋白丰度和转录表达。已发表的数据被用于训练机器学习模型,仅根据转录表达预测表面蛋白丰度。然而,这些计算方法预测的蛋白数量较少,在不同的环境(例如不同的组织/疾病状态)中的泛化能力较差,限制了它们的广泛应用。在这里,我们提出了 SPIDER(使用单细胞 RNA 测序的深度集成进行表面蛋白预测),这是一种与上下文无关的零样本深度集成模型,能够实现大规模的蛋白丰度预测,并更好地推广到各种环境。全面的基准测试表明,SPIDER 优于其他最先进的方法。我们使用来自单细胞转录组的 >2500 种蛋白质的预测表面丰度,展示了 SPIDER 的广泛应用,包括肝细胞癌和结直肠癌中的细胞类型注释、生物标志物/靶点识别以及细胞间相互作用分析。本论文的透明同行评审过程记录包含在补充信息中。