OpenProt：探索真核生物编码潜能和蛋白质组的更全面指南。

OpenProt: a more comprehensive guide to explore eukaryotic coding potential and proteomes.

机构信息

Department of Biochemistry, Université de Sherbrooke, Sherbrooke, Québec, Canada.

PROTEO, Quebec Network for Research on Protein Function, Structure, and Engineering, Université de Lille, F-59000 Lille, France.

出版信息

Nucleic Acids Res. 2019 Jan 8;47(D1):D403-D410. doi: 10.1093/nar/gky936.

DOI:10.1093/nar/gky936

PMID:30299502

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC6323990/

Abstract

Advances in proteomics and sequencing have highlighted many non-annotated open reading frames (ORFs) in eukaryotic genomes. Genome annotations, cornerstones of today's research, mostly rely on protein prior knowledge and on ab initio prediction algorithms. Such algorithms notably enforce an arbitrary criterion of one coding sequence (CDS) per transcript, leading to a substantial underestimation of the coding potential of eukaryotes. Here, we present OpenProt, the first database fully endorsing a polycistronic model of eukaryotic genomes to date. OpenProt contains all possible ORFs longer than 30 codons across 10 species, and cumulates supporting evidence such as protein conservation, translation and expression. OpenProt annotates all known proteins (RefProts), novel predicted isoforms (Isoforms) and novel predicted proteins from alternative ORFs (AltProts). It incorporates cutting-edge algorithms to evaluate protein orthology and re-interrogate publicly available ribosome profiling and mass spectrometry datasets, supporting the annotation of thousands of predicted ORFs. The constantly growing database currently cumulates evidence from 87 ribosome profiling and 114 mass spectrometry studies from several species, tissues and cell lines. All data is freely available and downloadable from a web platform (www.openprot.org) supporting a genome browser and advanced queries for each species. Thus, OpenProt enables a more comprehensive landscape of eukaryotic genomes' coding potential.

摘要

蛋白质组学和测序技术的进步突显了真核生物基因组中许多未注释的开放阅读框（ORFs）。基因组注释是当今研究的基石，主要依赖于蛋白质先验知识和从头预测算法。这些算法特别强调每个转录本只有一个编码序列（CDS）的任意标准，导致真核生物的编码潜力被严重低估。在这里，我们介绍了 OpenProt，这是迄今为止第一个完全支持真核生物基因组多顺反子模型的数据库。OpenProt 包含了 10 个物种中所有长度超过 30 个密码子的可能的 ORFs，并累积了支持证据，如蛋白质保守性、翻译和表达。OpenProt 注释了所有已知的蛋白质（RefProts）、新预测的同工型（Isoforms）和来自替代 ORFs 的新预测蛋白质（AltProts）。它整合了最新的算法来评估蛋白质的同源性，并重新分析公共核糖体图谱和质谱数据集，支持对数千个预测 ORFs 的注释。这个不断增长的数据库目前从多个物种、组织和细胞系的 87 个核糖体图谱和 114 个质谱研究中累积了证据。所有数据均可从一个支持基因组浏览器和每个物种高级查询的网络平台（www.openprot.org）上免费获取和下载。因此，OpenProt 能够更全面地了解真核生物基因组的编码潜力。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6b74/6323990/eb7ae8a14872/gky936fig1.jpg

相似文献

OpenProt: a more comprehensive guide to explore eukaryotic coding potential and proteomes.OpenProt：探索真核生物编码潜能和蛋白质组的更全面指南。

Nucleic Acids Res. 2019 Jan 8;47(D1):D403-D410. doi: 10.1093/nar/gky936.

OpenProt 2021: deeper functional annotation of the coding potential of eukaryotic genomes.OpenProt 2021：深入注释真核生物基因组的编码潜能。

Nucleic Acids Res. 2021 Jan 8;49(D1):D380-D388. doi: 10.1093/nar/gkaa1036.

Mass Spectrometry-Based Proteomics Analyses Using the OpenProt Database to Unveil Novel Proteins Translated from Non-Canonical Open Reading Frames.基于质谱的蛋白质组学分析，利用OpenProt数据库揭示从非经典开放阅读框翻译而来的新型蛋白质。

J Vis Exp. 2019 Apr 11(146). doi: 10.3791/59589.

How to Illuminate the Dark Proteome Using the Multi-omic OpenProt Resource.如何利用多组学开放蛋白质组资源照亮暗蛋白质组。

Curr Protoc Bioinformatics. 2020 Sep;71(1):e103. doi: 10.1002/cpbi.103.

Exploring the Alternative Proteome with OpenProt and Mass Spectrometry.探索开放蛋白质组学和质谱技术中的替代蛋白质组。

Methods Mol Biol. 2024;2836:3-17. doi: 10.1007/978-1-0716-4007-4_1.

OpenProt 2.0 builds a path to the functional characterization of alternative proteins.OpenProt 2.0 为探索替代蛋白的功能特性开辟了道路。

Nucleic Acids Res. 2024 Jan 5;52(D1):D522-D528. doi: 10.1093/nar/gkad1050.

REPARATION: ribosome profiling assisted (re-)annotation of bacterial genomes.REPARATION：核糖体谱分析辅助的细菌基因组（重新）注释

Nucleic Acids Res. 2017 Nov 16;45(20):e168. doi: 10.1093/nar/gkx758.

smORFer: a modular algorithm to detect small ORFs in prokaryotes.smORFer：一种用于在原核生物中检测小开放阅读框的模块化算法。

Nucleic Acids Res. 2021 Sep 7;49(15):e89. doi: 10.1093/nar/gkab477.

The Protein Coded by a Short Open Reading Frame, Not by the Annotated Coding Sequence, Is the Main Gene Product of the Dual-Coding Gene .短开放阅读框编码的蛋白，而非注释编码序列，是双编码基因的主要基因产物。

Mol Cell Proteomics. 2018 Dec;17(12):2402-2411. doi: 10.1074/mcp.RA118.000593. Epub 2018 Sep 4.

Non-AUG start codons: Expanding and regulating the small and alternative ORFeome.非 AUG 起始密码子：扩展和调控小 ORF 和替代 ORFeome。

Exp Cell Res. 2020 Jun 1;391(1):111973. doi: 10.1016/j.yexcr.2020.111973. Epub 2020 Mar 21.

引用本文的文献

Identification of Small Open Reading Frame-encoded Proteins in the Human Genome.人类基因组中小开放阅读框编码蛋白质的鉴定

Genomics Proteomics Bioinformatics. 2025 May 10;23(1). doi: 10.1093/gpbjnl/qzaf004.

AthRiboNC: an Arabidopsis database for ncRNAs with coding potential revealed from ribosome profiling.AthRiboNC：一个用于存储从核糖体谱分析中揭示的具有编码潜力的非编码RNA的拟南芥数据库。

Database (Oxford). 2024 Dec 17;2024. doi: 10.1093/database/baae123.

Proteomics Can Rise to the Challenge of Pseudogenes' Coding Nature.蛋白质组学能够应对假基因编码特性带来的挑战。

J Proteome Res. 2024 Dec 6;23(12):5233-5249. doi: 10.1021/acs.jproteome.4c00116. Epub 2024 Nov 1.

HMPA: a pioneering framework for the noncanonical peptidome from discovery to functional insights.HMPA：从发现到功能见解的非典型肽组学的开创性框架。

Brief Bioinform. 2024 Sep 23;25(6). doi: 10.1093/bib/bbae510.

Deciphering the ghost proteome in ovarian cancer cells by deep proteogenomic characterization.通过深度蛋白质基因组特征分析破译卵巢癌细胞中的幽灵蛋白质组。

Cell Death Dis. 2024 Sep 30;15(9):712. doi: 10.1038/s41419-024-07046-1.

D-sORF: Accurate Ab Initio Classification of Experimentally Detected Small Open Reading Frames (sORFs) Associated with Translational Machinery.D-sORF：对实验检测到的与翻译机制相关的小开放阅读框（sORF）进行准确的从头分类。

Biology (Basel). 2024 Jul 26;13(8):563. doi: 10.3390/biology13080563.

No country for old methods: New tools for studying microproteins.旧方法的时代不再：研究微蛋白的新工具

iScience. 2024 Jan 20;27(2):108972. doi: 10.1016/j.isci.2024.108972. eCollection 2024 Feb 16.

SpliceProt 2.0: A Sequence Repository of Human, Mouse, and Rat Proteoforms.SpliceProt 2.0：人类、小鼠和大鼠蛋白异构体序列库。

Int J Mol Sci. 2024 Jan 18;25(2):1183. doi: 10.3390/ijms25021183.

Discovering microproteins: making the most of ribosome profiling data.发现微小蛋白质：充分利用核糖体分析数据。

RNA Biol. 2023 Jan;20(1):943-954. doi: 10.1080/15476286.2023.2279845. Epub 2023 Nov 27.

OpenProt 2.0 builds a path to the functional characterization of alternative proteins.OpenProt 2.0 为探索替代蛋白的功能特性开辟了道路。

Nucleic Acids Res. 2024 Jan 5;52(D1):D522-D528. doi: 10.1093/nar/gkad1050.

本文引用的文献

Recognition of the polycistronic nature of human genes is critical to understanding the genotype-phenotype relationship.认识人类基因的多顺反子性质对于理解基因型-表型关系至关重要。

Genome Res. 2018 May;28(5):609-624. doi: 10.1101/gr.230938.117. Epub 2018 Apr 6.

Improved Ribo-seq enables identification of cryptic translation events.改进的核糖体测序技术可鉴定隐匿翻译事件。

Nat Methods. 2018 May;15(5):363-366. doi: 10.1038/nmeth.4631. Epub 2018 Mar 12.

Ensembl 2018.Ensembl 2018.

Nucleic Acids Res. 2018 Jan 4;46(D1):D754-D761. doi: 10.1093/nar/gkx1098.

An update on sORFs.org: a repository of small ORFs identified by ribosome profiling.sORFs.org 更新：核糖体图谱鉴定的小开放阅读框数据库。

Nucleic Acids Res. 2018 Jan 4;46(D1):D497-D502. doi: 10.1093/nar/gkx1130.

Deep transcriptome annotation enables the discovery and functional characterization of cryptic small proteins.深度转录组注释可实现隐匿性小蛋白的发现和功能特征分析。

Elife. 2017 Oct 30;6:e27860. doi: 10.7554/eLife.27860.

Small but Mighty: Functional Peptides Encoded by Small ORFs in Plants.小而强大：植物中小 ORF 编码的功能肽。

Proteomics. 2018 May;18(10):e1700038. doi: 10.1002/pmic.201700038. Epub 2017 Oct 12.

Small Proteins Encoded by Unannotated ORFs are Rising Stars of the Proteome, Confirming Shortcomings in Genome Annotations and Current Vision of an mRNA.非注释开放阅读框编码的小蛋白是蛋白质组的后起之秀，这证实了基因组注释和当前 mRNA 概念的不足。

Proteomics. 2018 May;18(10):e1700058. doi: 10.1002/pmic.201700058. Epub 2017 Oct 11.

SPAR, a lncRNA encoded mTORC1 inhibitor.SPAR，一种由长链非编码RNA编码的mTORC1抑制剂。

Cell Cycle. 2017 May 3;16(9):815-816. doi: 10.1080/15384101.2017.1304735. Epub 2017 Mar 20.

SmProt: a database of small proteins encoded by annotated coding and non-coding RNA loci.SmProt：一个注释编码和非编码 RNA 基因座所编码的小蛋白数据库。

Brief Bioinform. 2018 Jul 20;19(4):636-643. doi: 10.1093/bib/bbx005.

The ProteomeXchange consortium in 2017: supporting the cultural change in proteomics public data deposition.蛋白质组交换联盟2017年：支持蛋白质组学公共数据存缴方面的文化变革。

Nucleic Acids Res. 2017 Jan 4;45(D1):D1100-D1106. doi: 10.1093/nar/gkw936. Epub 2016 Oct 18.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

OpenProt：探索真核生物编码潜能和蛋白质组的更全面指南。

OpenProt: a more comprehensive guide to explore eukaryotic coding potential and proteomes.

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献