蛋白质亚细胞定位注释的一致性与变异性

Consistency and variation of protein subcellular location annotations.

作者信息

Xu Ying-Ying, Zhou Hang, Murphy Robert F, Shen Hong-Bin

机构信息

School of Biomedical Engineering, Southern Medical University, Guangzhou, China.

Institute of Image Processing and Pattern Recognition, Shanghai Jiao Tong University, Shanghai, China.

出版信息

Proteins. 2021 Feb;89(2):242-250. doi: 10.1002/prot.26010. Epub 2020 Sep 26.

DOI:10.1002/prot.26010

PMID:32935893

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC7790864/

Abstract

A major challenge for protein databases is reconciling information from diverse sources. This is especially difficult when some information consists of secondary, human-interpreted rather than primary data. For example, the Swiss-Prot database contains curated annotations of subcellular location that are based on predictions from protein sequence, statements in scientific articles, and published experimental evidence. The Human Protein Atlas (HPA) consists of millions of high-resolution microscopic images that show protein spatial distribution on a cellular and subcellular level. These images are manually annotated with protein subcellular locations by trained experts. The image annotations in HPA can capture the variation of subcellular location across different cell lines, tissues, or tissue states. Systematic investigation of the consistency between HPA and Swiss-Prot assignments of subcellular location, which is important for understanding and utilizing protein location data from the two databases, has not been described previously. In this paper, we quantitatively evaluate the consistency of subcellular location annotations between HPA and Swiss-Prot at multiple levels, as well as variation of protein locations across cell lines and tissues. Our results show that annotations of these two databases differ significantly in many cases, leading to proposed procedures for deriving and integrating the protein subcellular location data. We also find that proteins having highly variable locations are more likely to be biomarkers of diseases, providing support for incorporating analysis of subcellular location in protein biomarker identification and screening.

摘要

蛋白质数据库面临的一个主要挑战是协调来自不同来源的信息。当某些信息是二级的、人为解读而非原始数据时，这尤其困难。例如，Swiss-Prot数据库包含基于蛋白质序列预测、科学文章中的陈述以及已发表实验证据的亚细胞定位的人工注释。人类蛋白质图谱（HPA）由数百万张高分辨率显微图像组成，这些图像展示了蛋白质在细胞和亚细胞水平上的空间分布。这些图像由训练有素的专家手动标注蛋白质亚细胞定位。HPA中的图像注释可以捕捉不同细胞系、组织或组织状态下亚细胞定位的变化。此前尚未描述对HPA和Swiss-Prot亚细胞定位分配之间的一致性进行系统研究，而这种一致性对于理解和利用来自这两个数据库的蛋白质定位数据很重要。在本文中，我们在多个层面定量评估了HPA和Swiss-Prot之间亚细胞定位注释的一致性，以及蛋白质在不同细胞系和组织中的定位变化。我们的结果表明，这两个数据库的注释在许多情况下存在显著差异，从而引出了推导和整合蛋白质亚细胞定位数据的建议程序。我们还发现，定位高度可变的蛋白质更有可能是疾病的生物标志物，这为在蛋白质生物标志物识别和筛选中纳入亚细胞定位分析提供了支持。

相似文献

Consistency and variation of protein subcellular location annotations.蛋白质亚细胞定位注释的一致性与变异性

Proteins. 2021 Feb;89(2):242-250. doi: 10.1002/prot.26010. Epub 2020 Sep 26.

Euk-mPLoc: a fusion classifier for large-scale eukaryotic protein subcellular location prediction by incorporating multiple sites.Euk-mPLoc：一种通过整合多个位点进行大规模真核生物蛋白质亚细胞定位预测的融合分类器。

J Proteome Res. 2007 May;6(5):1728-34. doi: 10.1021/pr060635i. Epub 2007 Mar 31.

LocText: relation extraction of protein localizations to assist database curation.蛋白质定位的关系提取以辅助数据库编纂。

BMC Bioinformatics. 2018 Jan 17;19(1):15. doi: 10.1186/s12859-018-2021-9.

Hum-mPLoc: an ensemble classifier for large-scale human protein subcellular location prediction by incorporating samples with multiple sites.Hum-mPLoc：一种通过纳入具有多个位点的样本进行大规模人类蛋白质亚细胞定位预测的集成分类器。

Biochem Biophys Res Commun. 2007 Apr 20;355(4):1006-11. doi: 10.1016/j.bbrc.2007.02.071. Epub 2007 Feb 23.

Euk-PLoc: an ensemble classifier for large-scale eukaryotic protein subcellular location prediction.Euk-PLoc：一种用于大规模真核生物蛋白质亚细胞定位预测的集成分类器。

Amino Acids. 2007 Jul;33(1):57-67. doi: 10.1007/s00726-006-0478-8. Epub 2007 Jan 19.

Bioinformatics analysis of correlation between protein function and intrinsic disorder.蛋白质功能与固有无序性相关性的生物信息学分析。

Int J Biol Macromol. 2021 Jan 15;167:446-456. doi: 10.1016/j.ijbiomac.2020.11.211. Epub 2020 Dec 2.

Dual-Signal Feature Spaces Map Protein Subcellular Locations Based on Immunohistochemistry Image and Protein Sequence.基于免疫组化图像和蛋白质序列的双信号特征空间映射蛋白质亚细胞定位。

Sensors (Basel). 2023 Nov 7;23(22):9014. doi: 10.3390/s23229014.

Predicting eukaryotic protein subcellular location by fusing optimized evidence-theoretic K-Nearest Neighbor classifiers.通过融合优化的证据理论K近邻分类器预测真核生物蛋白质亚细胞定位

J Proteome Res. 2006 Aug;5(8):1888-97. doi: 10.1021/pr060167c.

PathLocdb: a comprehensive database for the subcellular localization of metabolic pathways and its application to multiple localization analysis.PathLocdb：一个用于代谢途径亚细胞定位的综合数据库及其在多种定位分析中的应用。

BMC Genomics. 2010 Dec 2;11 Suppl 4(Suppl 4):S13. doi: 10.1186/1471-2164-11-S4-S13.

An ensemble classifier for eukaryotic protein subcellular location prediction using gene ontology categories and amino acid hydrophobicity.基于基因本体论类别和氨基酸疏水性的真核蛋白质亚细胞定位预测的集成分类器。

PLoS One. 2012;7(1):e31057. doi: 10.1371/journal.pone.0031057. Epub 2012 Jan 30.

引用本文的文献

Prediction of protein subcellular localization in single cells.单细胞中蛋白质亚细胞定位的预测。

Nat Methods. 2025 May 13. doi: 10.1038/s41592-025-02696-1.

Computational identification of surface markers for isolating distinct subpopulations from heterogeneous cancer cell populations.计算鉴定用于从异质癌细胞群体中分离不同亚群的表面标记物。

NPJ Syst Biol Appl. 2024 Oct 17;10(1):120. doi: 10.1038/s41540-024-00441-6.

Prediction of protein subcellular localization in single cells.单细胞中蛋白质亚细胞定位的预测。

bioRxiv. 2024 Jul 25:2024.07.25.605178. doi: 10.1101/2024.07.25.605178.

Computational identification of surface markers for isolating distinct subpopulations from heterogeneous cancer cell populations.通过计算识别表面标志物以从异质性癌细胞群体中分离出不同亚群

bioRxiv. 2024 Jun 2:2024.05.28.596337. doi: 10.1101/2024.05.28.596337.

Recent Advances in the Prediction of Subcellular Localization of Proteins and Related Topics.蛋白质亚细胞定位预测及相关主题的最新进展

Front Bioinform. 2022 May 19;2:910531. doi: 10.3389/fbinf.2022.910531. eCollection 2022.

Improving Protein Subcellular Location Classification by Incorporating Three-Dimensional Structure Information.通过整合三维结构信息来改进蛋白质亚细胞定位分类。

Biomolecules. 2021 Oct 29;11(11):1607. doi: 10.3390/biom11111607.

Convolutional Neural Network-Based Artificial Intelligence for Classification of Protein Localization Patterns.基于卷积神经网络的人工智能在蛋白质定位模式分类中的应用。

Biomolecules. 2021 Feb 11;11(2):264. doi: 10.3390/biom11020264.

本文引用的文献

Integration of Heterogeneous Experimental Data Improves Global Map of Human Protein Complexes.整合异质实验数据可改善人类蛋白质复合物的全局图谱。

ACM BCB. 2019 Sep;2019:144-153. doi: 10.1145/3307339.3342150.

Analysis of the Human Protein Atlas Image Classification competition.人类蛋白质图谱图像分类竞赛分析。

Nat Methods. 2019 Dec;16(12):1254-1261. doi: 10.1038/s41592-019-0658-6. Epub 2019 Nov 28.

Learning complex subcellular distribution patterns of proteins via analysis of immunohistochemistry images.通过分析免疫组化图像来学习蛋白质的复杂亚细胞分布模式。

Bioinformatics. 2020 Mar 1;36(6):1908-1914. doi: 10.1093/bioinformatics/btz844.

Deep learning is combined with massive-scale citizen science to improve large-scale image classification.深度学习与大规模公民科学相结合，以改进大规模图像分类。

Nat Biotechnol. 2018 Oct;36(9):820-828. doi: 10.1038/nbt.4225. Epub 2018 Aug 20.

CellMap visualizes protein-protein interactions and subcellular localization.CellMap可直观显示蛋白质-蛋白质相互作用及亚细胞定位。

F1000Res. 2017 Oct 11;6:1824. doi: 10.12688/f1000research.12707.2. eCollection 2017.

DeepLoc: prediction of protein subcellular localization using deep learning.DeepLoc：使用深度学习进行蛋白质亚细胞定位预测。

Bioinformatics. 2017 Nov 1;33(21):3387-3395. doi: 10.1093/bioinformatics/btx431.

In silico re-identification of properties of drug target proteins.药物靶蛋白特性的计算机再鉴定

BMC Bioinformatics. 2017 May 31;18(Suppl 7):248. doi: 10.1186/s12859-017-1639-3.

A subcellular map of the human proteome.人类蛋白质组的亚细胞图谱。

Science. 2017 May 26;356(6340). doi: 10.1126/science.aal3321. Epub 2017 May 11.

Automated analysis of high-content microscopy data with deep learning.利用深度学习对高内涵显微镜数据进行自动分析。

Mol Syst Biol. 2017 Apr 18;13(4):924. doi: 10.15252/msb.20177551.

SubCons: a new ensemble method for improved human subcellular localization predictions.SubCons：一种用于改进人类亚细胞定位预测的新集成方法。

Bioinformatics. 2017 Aug 15;33(16):2464-2470. doi: 10.1093/bioinformatics/btx219.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验