• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

单细胞中蛋白质亚细胞定位的预测。

Prediction of protein subcellular localization in single cells.

作者信息

Zhang Xinyi, Tseo Yitong, Bai Yunhao, Chen Fei, Uhler Caroline

机构信息

Department of Electrical Engineering and Computer Science, Massachusetts Institute of Technology, U.S.A.

Eric and Wendy Schmidt Center, Broad Institute of MIT and Harvard, U.S.A.

出版信息

bioRxiv. 2024 Jul 25:2024.07.25.605178. doi: 10.1101/2024.07.25.605178.

DOI:10.1101/2024.07.25.605178
PMID:39091825
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC11291118/
Abstract

The subcellular localization of a protein is important for its function and interaction with other molecules, and its mislocalization is linked to numerous diseases. While atlas-scale efforts have been made to profile protein localization across various cell lines, existing datasets only contain limited pairs of proteins and cell lines which do not cover all human proteins. We present a method that uses both protein sequences and cellular landmark images to perform redictions of nseen roteins' ubcellular localization (), which can generalize to both proteins and cell lines not used for model training. PUPS combines a protein language model and an image inpainting model to utilize both protein sequence and cellular images for protein localization prediction. The protein sequence input enables generalization to unseen proteins and the cellular image input enables cell type specific prediction that captures single-cell variability. PUPS' ability to generalize to unseen proteins and cell lines enables us to assess the variability in protein localization across cell lines as well as across single cells within a cell line and to identify the biological processes associated with the proteins that have variable localization. Experimental validation shows that PUPS can be used to predict protein localization in newly performed experiments outside of the Human Protein Atlas used for training. Collectively, PUPS utilizes both protein sequences and cellular images to predict protein localization in unseen proteins and cell lines with the ability to capture single-cell variability.

摘要

蛋白质的亚细胞定位对于其功能以及与其他分子的相互作用至关重要,而其定位错误与多种疾病相关。尽管已经开展了大规模的工作来描绘各种细胞系中的蛋白质定位,但现有数据集仅包含有限的蛋白质和细胞系对,并未涵盖所有人类蛋白质。我们提出了一种方法,该方法利用蛋白质序列和细胞地标图像来预测未见过的蛋白质的亚细胞定位(),这种方法可以推广到未用于模型训练的蛋白质和细胞系。PUPS结合了蛋白质语言模型和图像修复模型,以利用蛋白质序列和细胞图像进行蛋白质定位预测。蛋白质序列输入能够推广到未见过的蛋白质,而细胞图像输入能够进行细胞类型特异性预测,从而捕捉单细胞变异性。PUPS推广到未见过的蛋白质和细胞系的能力使我们能够评估蛋白质定位在不同细胞系之间以及同一细胞系内不同单细胞之间的变异性,并识别与定位可变的蛋白质相关的生物学过程。实验验证表明,PUPS可用于在用于训练的人类蛋白质图谱之外的新进行的实验中预测蛋白质定位。总的来说,PUPS利用蛋白质序列和细胞图像来预测未见过的蛋白质和细胞系中的蛋白质定位,并具有捕捉单细胞变异性的能力。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/fc72/11291118/e9b4af8d88dd/nihpp-2024.07.25.605178v1-f0006.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/fc72/11291118/025e38ffc3d1/nihpp-2024.07.25.605178v1-f0001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/fc72/11291118/7139abb4b86a/nihpp-2024.07.25.605178v1-f0002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/fc72/11291118/4437fada3b95/nihpp-2024.07.25.605178v1-f0003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/fc72/11291118/5c42d6396bc1/nihpp-2024.07.25.605178v1-f0004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/fc72/11291118/a66f0368af2e/nihpp-2024.07.25.605178v1-f0005.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/fc72/11291118/e9b4af8d88dd/nihpp-2024.07.25.605178v1-f0006.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/fc72/11291118/025e38ffc3d1/nihpp-2024.07.25.605178v1-f0001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/fc72/11291118/7139abb4b86a/nihpp-2024.07.25.605178v1-f0002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/fc72/11291118/4437fada3b95/nihpp-2024.07.25.605178v1-f0003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/fc72/11291118/5c42d6396bc1/nihpp-2024.07.25.605178v1-f0004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/fc72/11291118/a66f0368af2e/nihpp-2024.07.25.605178v1-f0005.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/fc72/11291118/e9b4af8d88dd/nihpp-2024.07.25.605178v1-f0006.jpg

相似文献

1
Prediction of protein subcellular localization in single cells.单细胞中蛋白质亚细胞定位的预测。
bioRxiv. 2024 Jul 25:2024.07.25.605178. doi: 10.1101/2024.07.25.605178.
2
Prediction of protein subcellular localization in single cells.单细胞中蛋白质亚细胞定位的预测。
Nat Methods. 2025 May 13. doi: 10.1038/s41592-025-02696-1.
3
Deep generative model for protein subcellular localization prediction.用于蛋白质亚细胞定位预测的深度生成模型。
Brief Bioinform. 2025 Mar 4;26(2). doi: 10.1093/bib/bbaf152.
4
CellCircLoc: Deep Neural Network for Predicting and Explaining Cell Line-Specific CircRNA Subcellular Localization.CellCircLoc:用于预测和解释细胞系特异性环状RNA亚细胞定位的深度神经网络。
IEEE J Biomed Health Inform. 2025 Feb;29(2):1494-1503. doi: 10.1109/JBHI.2024.3491732. Epub 2025 Feb 10.
5
Use of Chou's 5-steps rule to predict the subcellular localization of gram-negative and gram-positive bacterial proteins by multi-label learning based on gene ontology annotation and profile alignment.利用 Chou 的 5 步规则,通过基于基因本体论注释和序列比对的多标签学习,预测革兰氏阴性和革兰氏阳性细菌蛋白质的亚细胞定位。
J Integr Bioinform. 2020 Jun 29;18(1):51-79. doi: 10.1515/jib-2019-0091.
6
ProLoc-GO: utilizing informative Gene Ontology terms for sequence-based prediction of protein subcellular localization.ProLoc-GO:利用信息丰富的基因本体术语进行基于序列的蛋白质亚细胞定位预测。
BMC Bioinformatics. 2008 Feb 1;9:80. doi: 10.1186/1471-2105-9-80.
7
Prediction of nuclear proteins using nuclear translocation signals proposed by probabilistic latent semantic indexing.基于概率潜在语义索引的核转位信号预测核蛋白。
BMC Bioinformatics. 2012;13 Suppl 17(Suppl 17):S13. doi: 10.1186/1471-2105-13-S17-S13. Epub 2012 Dec 13.
8
Prediction of Protein Subcellular Localization Based on Fusion of Multi-view Features.基于多视图特征融合的蛋白质亚细胞定位预测。
Molecules. 2019 Mar 6;24(5):919. doi: 10.3390/molecules24050919.
9
Ensemble machine learning model trained on a new synthesized dataset generalizes well for stress prediction using wearable devices.在新合成数据集上训练的集成机器学习模型,对于使用可穿戴设备进行压力预测具有良好的泛化能力。
J Biomed Inform. 2023 Dec;148:104556. doi: 10.1016/j.jbi.2023.104556. Epub 2023 Dec 2.
10
Prediction of protein subcellular localization.蛋白质亚细胞定位预测
Proteins. 2006 Aug 15;64(3):643-51. doi: 10.1002/prot.21018.

本文引用的文献

1
Protein codes promote selective subcellular compartmentalization.蛋白质编码促进选择性亚细胞区室化。
Science. 2025 Mar 7;387(6738):1095-1101. doi: 10.1126/science.adq2634. Epub 2025 Feb 6.
2
Evolutionary-scale prediction of atomic-level protein structure with a language model.用语言模型进行原子级蛋白质结构的进化尺度预测。
Science. 2023 Mar 17;379(6637):1123-1130. doi: 10.1126/science.ade2574. Epub 2023 Mar 16.
3
Light attention predicts protein location from the language of life.轻注意力从生命语言中预测蛋白质位置。
Bioinform Adv. 2021 Nov 19;1(1):vbab035. doi: 10.1093/bioadv/vbab035. eCollection 2021.
4
Graph-based autoencoder integrates spatial transcriptomics with chromatin images and identifies joint biomarkers for Alzheimer's disease.基于图的自动编码器将空间转录组学与染色质图像集成,并确定阿尔茨海默病的联合生物标志物。
Nat Commun. 2022 Dec 3;13(1):7480. doi: 10.1038/s41467-022-35233-1.
5
UniProt: the Universal Protein Knowledgebase in 2023.UniProt:2023 年的通用蛋白质知识库。
Nucleic Acids Res. 2023 Jan 6;51(D1):D523-D531. doi: 10.1093/nar/gkac1052.
6
Self-supervised deep learning encodes high-resolution features of protein subcellular localization.自监督深度学习编码了蛋白质亚细胞定位的高分辨率特征。
Nat Methods. 2022 Aug;19(8):995-1003. doi: 10.1038/s41592-022-01541-z. Epub 2022 Jul 25.
7
OpenCell: Endogenous tagging for the cartography of human cellular organization.OpenCell:用于人类细胞组织图谱绘制的内源性标记。
Science. 2022 Mar 11;375(6585):eabi6983. doi: 10.1126/science.abi6983.
8
Positional SHAP (PoSHAP) for Interpretation of machine learning models trained from biological sequences.基于位置的 SHAP 值(PoSHAP)可用于解释基于生物序列训练的机器学习模型。
PLoS Comput Biol. 2022 Jan 28;18(1):e1009736. doi: 10.1371/journal.pcbi.1009736. eCollection 2022 Jan.
9
Highly accurate protein structure prediction with AlphaFold.利用 AlphaFold 进行高精度蛋白质结构预测。
Nature. 2021 Aug;596(7873):583-589. doi: 10.1038/s41586-021-03819-2. Epub 2021 Jul 15.
10
Multi-domain translation between single-cell imaging and sequencing data using autoencoders.基于自动编码器的单细胞成像和测序数据的多领域转换。
Nat Commun. 2021 Jan 4;12(1):31. doi: 10.1038/s41467-020-20249-2.