Zhang Xinyi, Tseo Yitong, Bai Yunhao, Chen Fei, Uhler Caroline
Department of Electrical Engineering and Computer Science, Massachusetts Institute of Technology, Cambridge, MA, USA.
Eric and Wendy Schmidt Center, Broad Institute of MIT and Harvard, Cambridge, MA, USA.
Nat Methods. 2025 May 13. doi: 10.1038/s41592-025-02696-1.
The subcellular localization of a protein is important for its function, and its mislocalization is linked to numerous diseases. Existing datasets capture limited pairs of proteins and cell lines, and existing protein localization prediction models either miss cell-type specificity or cannot generalize to unseen proteins. Here we present a method for Prediction of Unseen Proteins' Subcellular localization (PUPS). PUPS combines a protein language model and an image inpainting model to utilize both protein sequence and cellular images. We demonstrate that the protein sequence input enables generalization to unseen proteins, and the cellular image input captures single-cell variability, enabling cell-type-specific predictions. Experimental validation shows that PUPS can predict protein localization in newly performed experiments outside the Human Protein Atlas used for training. Collectively, PUPS provides a framework for predicting differential protein localization across cell lines and single cells within a cell line, including changes in protein localization driven by mutations.
蛋白质的亚细胞定位对其功能很重要,而其定位错误与多种疾病相关。现有数据集捕获的蛋白质和细胞系对有限,现有的蛋白质定位预测模型要么忽略细胞类型特异性,要么无法推广到未见过的蛋白质。在此,我们提出一种用于预测未见过蛋白质亚细胞定位的方法(PUPS)。PUPS结合了蛋白质语言模型和图像修复模型,以同时利用蛋白质序列和细胞图像。我们证明,蛋白质序列输入能够推广到未见过的蛋白质,而细胞图像输入则捕获单细胞变异性,从而实现细胞类型特异性预测。实验验证表明,PUPS可以在用于训练的人类蛋白质图谱之外的新实验中预测蛋白质定位。总体而言,PUPS提供了一个框架,用于预测跨细胞系以及细胞系内单细胞的差异蛋白质定位,包括由突变驱动的蛋白质定位变化。