• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

悠悠球的最后几圈:评估人类经典蛋白质数量。

Last rolls of the yoyo: Assessing the human canonical protein count.

作者信息

Southan Christopher

机构信息

IUPHAR/BPS Guide to Pharmacology, Centre for Integrative Physiology, University of Edinburgh, Edinburgh, EH8 9XD, UK.

出版信息

F1000Res. 2017 Apr 7;6:448. doi: 10.12688/f1000research.11119.1. eCollection 2017.

DOI:10.12688/f1000research.11119.1
PMID:28529709
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC5428527/
Abstract

In 2004, when the protein estimate from the finished human genome was only 24,000, the surprise was compounded as reviewed estimates fell to 19,000 by 2014. However, variability in the total canonical protein counts (i.e. excluding alternative splice forms) of open reading frames (ORFs) in different annotation portals persists. This work assesses these differences and possible causes. A 16-year analysis of Ensembl and UniProtKB/Swiss-Prot shows convergence to a protein number of ~20,000. The former had shown some yo-yoing, but both have now plateaued. Nine major annotation portals, reviewed at the beginning of 2017, gave a spread of counts from 21,819 down to 18,891. The 4-way cross-reference concordance (within UniProt) between Ensembl, Swiss-Prot, Entrez Gene and the Human Gene Nomenclature Committee (HGNC) drops to 18,690, indicating methodological differences in protein definitions and experimental existence support between sources. The Swiss-Prot and neXtProt evidence criteria include mass spectrometry peptide verification and also cross-references for antibody detection from the Human Protein Atlas. Notwithstanding, hundreds of Swiss-Prot entries are classified as non-coding biotypes by HGNC. The only inference that protein numbers might still rise comes from numerous reports of small ORF (smORF) discovery. However, while there have been recent cases of protein verifications from previous miss-annotation of non-coding RNA, very few have passed the Swiss-Prot curation and genome annotation thresholds. The post-genomic era has seen both advances in data generation and improvements in the human reference assembly. Notwithstanding, current numbers, while persistently discordant, show that the earlier yo-yoing has largely ceased. Given the importance to biology and biomedicine of defining the canonical human proteome, the task will need more collaborative inter-source curation combined with broader and deeper experimental confirmation and of proteins predicted . The eventual closure could be well be below ~19,000.

摘要

2004年,当根据已完成的人类基因组估算出的蛋白质数量仅为24000种时,令人惊讶的是,到2014年,经审查后的估算值降至19000种。然而,不同注释平台中开放阅读框(ORF)的总标准蛋白质计数(即不包括可变剪接形式)仍存在差异。这项工作评估了这些差异及可能的原因。对Ensembl和UniProtKB/Swiss-Prot进行的为期16年的分析表明,两者趋向于一个约20000种蛋白质的数量。前者曾有过一些波动,但现在两者都趋于平稳。2017年初对九个主要注释平台进行审查时,得出的计数范围从21819种到18891种不等。Ensembl、Swiss-Prot、Entrez Gene和人类基因命名委员会(HGNC)之间的四路交叉引用一致性(在UniProt内部)降至18690种,这表明不同来源在蛋白质定义和实验存在支持方面存在方法学差异。Swiss-Prot和neXtProt的证据标准包括质谱肽段验证以及来自人类蛋白质图谱的抗体检测交叉引用。尽管如此,HGNC仍将数百个Swiss-Prot条目归类为非编码生物型。蛋白质数量可能仍会增加的唯一推断来自众多关于小开放阅读框(smORF)发现的报告。然而,虽然最近有一些案例表明之前对非编码RNA的错误注释已被确认为蛋白质,但很少有能通过Swiss-Prot的审核和基因组注释阈值的。后基因组时代在数据生成方面取得了进展,人类参考基因组组装也有所改进。尽管如此,目前的数量虽然仍不一致,但表明早期的波动已基本停止。鉴于定义标准人类蛋白质组对生物学和生物医学的重要性,这项任务将需要更多跨来源的协作管理,同时结合更广泛、更深入的实验确认以及对预测蛋白质的确认。最终确定的数量很可能会低于约19000种。

相似文献

1
Last rolls of the yoyo: Assessing the human canonical protein count.悠悠球的最后几圈:评估人类经典蛋白质数量。
F1000Res. 2017 Apr 7;6:448. doi: 10.12688/f1000research.11119.1. eCollection 2017.
2
[Analysis, identification and correction of some errors of model refseqs appeared in NCBI Human Gene Database by in silico cloning and experimental verification of novel human genes].[通过新型人类基因的电子克隆和实验验证对NCBI人类基因数据库中出现的模型参考序列的一些错误进行分析、鉴定和校正]
Yi Chuan Xue Bao. 2004 May;31(5):431-43.
3
Annotating single amino acid polymorphisms in the UniProt/Swiss-Prot knowledgebase.在UniProt/Swiss-Prot知识库中注释单氨基酸多态性。
Hum Mutat. 2008 Mar;29(3):361-6. doi: 10.1002/humu.20671.
4
An enhanced workflow for variant interpretation in UniProtKB/Swiss-Prot improves consistency and reuse in ClinVar.在 UniProtKB/Swiss-Prot 中增强变体解释的工作流程可提高 ClinVar 中的一致性和重用性。
Database (Oxford). 2019 Jan 1;2019. doi: 10.1093/database/baz040.
5
UniProtKB/Swiss-Prot, the Manually Annotated Section of the UniProt KnowledgeBase: How to Use the Entry View.UniProtKB/Swiss-Prot,即UniProt知识库的人工注释部分:如何使用条目视图。
Methods Mol Biol. 2016;1374:23-54. doi: 10.1007/978-1-4939-3167-5_2.
6
The UniProtKB/Swiss-Prot knowledgebase and its Plant Proteome Annotation Program.通用蛋白质资源知识库/瑞士蛋白质数据库及其植物蛋白质组注释计划。
J Proteomics. 2009 Apr 13;72(3):567-73. doi: 10.1016/j.jprot.2008.11.010. Epub 2008 Nov 24.
7
Database verification studies of SWISS-PROT and GenBank.SWISS-PROT和GenBank的数据库验证研究。
Bioinformatics. 2001 Jun;17(6):526-32; discussion 533-4. doi: 10.1093/bioinformatics/17.6.526.
8
Reassessing domain architecture evolution of metazoan proteins: major impact of gene prediction errors.重新评估后生动物蛋白结构域架构进化:基因预测错误的主要影响。
Genes (Basel). 2011 Jul 13;2(3):449-501. doi: 10.3390/genes2030449.
9
Genetic variations and diseases in UniProtKB/Swiss-Prot: the ins and outs of expert manual curation.UniProtKB/Swiss-Prot中的基因变异与疾病:专家人工注释的来龙去脉
Hum Mutat. 2014 Aug;35(8):927-35. doi: 10.1002/humu.22594. Epub 2014 Jun 24.
10
UniProtKB/Swiss-Prot.通用蛋白质知识库/瑞士蛋白质数据库
Methods Mol Biol. 2007;406:89-112. doi: 10.1007/978-1-59745-535-0_4.

引用本文的文献

1
More than 2,500 coding genes in the human reference gene set still have unsettled status.人类参考基因集中超过2500个编码基因的状态仍未确定。
bioRxiv. 2024 Dec 9:2024.12.05.626965. doi: 10.1101/2024.12.05.626965.
2
Evidence for widespread translation of 5' untranslated regions.广泛存在 5' 非翻译区翻译的证据。
Nucleic Acids Res. 2024 Aug 12;52(14):8112-8126. doi: 10.1093/nar/gkae571.
3
C and G are frequently mutated into T and A in coding regions of human genes.在人类基因的编码区域,C 和 G 经常突变成 T 和 A。

本文引用的文献

1
Synthetic human proteomes for accelerating protein research.用于加速蛋白质研究的合成人类蛋白质组。
Nat Methods. 2017 Feb 28;14(3):240-242. doi: 10.1038/nmeth.4191.
2
An atlas of human long non-coding RNAs with accurate 5' ends.具有精确5'端的人类长链非编码RNA图谱。
Nature. 2017 Mar 9;543(7644):199-204. doi: 10.1038/nature21374. Epub 2017 Mar 1.
3
Fact or fiction: updates on how protein-coding genes might emerge from previously non-coding DNA.事实还是虚构:关于蛋白质编码基因如何从先前的非编码DNA中产生的最新情况。
Mol Genet Genomics. 2024 Mar 2;299(1):23. doi: 10.1007/s00438-024-02118-5.
4
Large-Scale Plasma Proteome Epitome Profiling is an Efficient Tool for the Discovery of Cancer Biomarkers.大规模血浆蛋白质组全景分析是发现癌症生物标志物的有效工具。
Mol Cell Proteomics. 2023 Jul;22(7):100580. doi: 10.1016/j.mcpro.2023.100580. Epub 2023 May 20.
5
The Role of Long Non-coding RNAs in Human Imprinting Disorders: Prospective Therapeutic Targets.长链非编码RNA在人类印记障碍中的作用:潜在的治疗靶点
Front Cell Dev Biol. 2021 Oct 25;9:730014. doi: 10.3389/fcell.2021.730014. eCollection 2021.
6
Launching the C-HPP neXt-CP50 Pilot Project for Functional Characterization of Identified Proteins with No Known Function.启动 C-HPP neXt-CP50 先导项目,对具有未知功能的已鉴定蛋白质进行功能特征分析。
J Proteome Res. 2018 Dec 7;17(12):4042-4050. doi: 10.1021/acs.jproteome.8b00383. Epub 2018 Nov 29.
7
Recognition of the polycistronic nature of human genes is critical to understanding the genotype-phenotype relationship.认识人类基因的多顺反子性质对于理解基因型-表型关系至关重要。
Genome Res. 2018 May;28(5):609-624. doi: 10.1101/gr.230938.117. Epub 2018 Apr 6.
F1000Res. 2017 Jan 19;6:57. doi: 10.12688/f1000research.10079.1. eCollection 2017.
4
Elabela/Toddler Is an Endogenous Agonist of the Apelin APJ Receptor in the Adult Cardiovascular System, and Exogenous Administration of the Peptide Compensates for the Downregulation of Its Expression in Pulmonary Arterial Hypertension.埃拉贝拉/Toddler是成人心血管系统中阿片肽APJ受体的内源性激动剂,外源性给予该肽可补偿其在肺动脉高压中表达的下调。
Circulation. 2017 Mar 21;135(12):1160-1173. doi: 10.1161/CIRCULATIONAHA.116.023218. Epub 2017 Jan 30.
5
UniProt: the universal protein knowledgebase.通用蛋白质知识库:UniProt
Nucleic Acids Res. 2017 Jan 4;45(D1):D158-D169. doi: 10.1093/nar/gkw1099. Epub 2016 Nov 29.
6
The neXtProt knowledgebase on human proteins: 2017 update.人类蛋白质的neXtProt知识库:2017年更新。
Nucleic Acids Res. 2017 Jan 4;45(D1):D177-D182. doi: 10.1093/nar/gkw1062. Epub 2016 Nov 29.
7
Ensembl 2017.Ensembl 2017年
Nucleic Acids Res. 2017 Jan 4;45(D1):D635-D642. doi: 10.1093/nar/gkw1104. Epub 2016 Nov 28.
8
Database Resources of the National Center for Biotechnology Information.美国国立医学图书馆国家生物技术信息中心数据库资源
Nucleic Acids Res. 2017 Jan 4;45(D1):D12-D17. doi: 10.1093/nar/gkw1071. Epub 2016 Nov 28.
9
Progress and pitfalls in finding the 'missing proteins' from the human proteome map.
Expert Rev Proteomics. 2017 Jan;14(1):9-14. doi: 10.1080/14789450.2017.1265450. Epub 2016 Dec 2.
10
The state of play in higher eukaryote gene annotation.高等真核生物基因注释的进展情况。
Nat Rev Genet. 2016 Dec;17(12):758-772. doi: 10.1038/nrg.2016.119. Epub 2016 Oct 24.