Suppr超能文献

单细胞中蛋白质亚细胞定位的预测。

Prediction of protein subcellular localization in single cells.

作者信息

Zhang Xinyi, Tseo Yitong, Bai Yunhao, Chen Fei, Uhler Caroline

机构信息

Department of Electrical Engineering and Computer Science, Massachusetts Institute of Technology, U.S.A.

Eric and Wendy Schmidt Center, Broad Institute of MIT and Harvard, U.S.A.

出版信息

bioRxiv. 2024 Jul 25:2024.07.25.605178. doi: 10.1101/2024.07.25.605178.

Abstract

The subcellular localization of a protein is important for its function and interaction with other molecules, and its mislocalization is linked to numerous diseases. While atlas-scale efforts have been made to profile protein localization across various cell lines, existing datasets only contain limited pairs of proteins and cell lines which do not cover all human proteins. We present a method that uses both protein sequences and cellular landmark images to perform redictions of nseen roteins' ubcellular localization (), which can generalize to both proteins and cell lines not used for model training. PUPS combines a protein language model and an image inpainting model to utilize both protein sequence and cellular images for protein localization prediction. The protein sequence input enables generalization to unseen proteins and the cellular image input enables cell type specific prediction that captures single-cell variability. PUPS' ability to generalize to unseen proteins and cell lines enables us to assess the variability in protein localization across cell lines as well as across single cells within a cell line and to identify the biological processes associated with the proteins that have variable localization. Experimental validation shows that PUPS can be used to predict protein localization in newly performed experiments outside of the Human Protein Atlas used for training. Collectively, PUPS utilizes both protein sequences and cellular images to predict protein localization in unseen proteins and cell lines with the ability to capture single-cell variability.

摘要

蛋白质的亚细胞定位对于其功能以及与其他分子的相互作用至关重要,而其定位错误与多种疾病相关。尽管已经开展了大规模的工作来描绘各种细胞系中的蛋白质定位,但现有数据集仅包含有限的蛋白质和细胞系对,并未涵盖所有人类蛋白质。我们提出了一种方法,该方法利用蛋白质序列和细胞地标图像来预测未见过的蛋白质的亚细胞定位(),这种方法可以推广到未用于模型训练的蛋白质和细胞系。PUPS结合了蛋白质语言模型和图像修复模型,以利用蛋白质序列和细胞图像进行蛋白质定位预测。蛋白质序列输入能够推广到未见过的蛋白质,而细胞图像输入能够进行细胞类型特异性预测,从而捕捉单细胞变异性。PUPS推广到未见过的蛋白质和细胞系的能力使我们能够评估蛋白质定位在不同细胞系之间以及同一细胞系内不同单细胞之间的变异性,并识别与定位可变的蛋白质相关的生物学过程。实验验证表明,PUPS可用于在用于训练的人类蛋白质图谱之外的新进行的实验中预测蛋白质定位。总的来说,PUPS利用蛋白质序列和细胞图像来预测未见过的蛋白质和细胞系中的蛋白质定位,并具有捕捉单细胞变异性的能力。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/fc72/11291118/025e38ffc3d1/nihpp-2024.07.25.605178v1-f0001.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验