对肽图谱数据库的深入审查发现了未注释编码基因和异常翻译的证据。

A deep audit of the PeptideAtlas database uncovers evidence for unannotated coding genes and aberrant translation.

作者信息

Rodriguez Jose Manuel, Maquedano Miguel, Cerdan-Velez Daniel, Calvo Enrique, Vazquez Jesús, Tress Michael L

机构信息

Cardiovascular Proteomics Laboratory, Centro Nacional de Investigaciones Cardiovasculares Carlos III (CNIC), 28029 Madrid, Spain.

CIBER de Enfermedades Cardiovasculares (CIBERCV), 28029 Madrid, Spain.

出版信息

bioRxiv. 2024 Nov 15:2024.11.14.623419. doi: 10.1101/2024.11.14.623419.

DOI:10.1101/2024.11.14.623419

PMID:39605392

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC11601488/

Abstract

The human genome has been the subject of intense scrutiny by experimental and manual curation projects for more than two decades. Novel coding genes have been proposed from large-scale RNASeq, ribosome profiling and proteomics experiments. Here we carry out an in-depth analysis of an entire proteomics database. We analysed the proteins, peptides and spectra housed in the human build of the PeptideAtlas proteomics database to identify coding regions that are not yet annotated in the GENCODE reference gene set. We find support for hundreds of missing alternative protein isoforms and unannotated upstream translations, and evidence of cross-contamination from other species. There was reliable peptide evidence for 34 novel unannotated open reading frames (ORFs) in PeptideAtlas. We find that almost half belong to coding genes that are missing from GENCODE and other reference sets. Most of the remaining ORFs were not conserved beyond human, however, and their peptide confirmation was restricted to cancer cell lines. We show that this is strong evidence for aberrant translation, raising important questions about the extent of aberrant translation and how these ORFs should be annotated in reference genomes.

摘要

二十多年来，人类基因组一直是实验和人工整理项目深入研究的对象。通过大规模RNA测序、核糖体分析和蛋白质组学实验，人们提出了新的编码基因。在此，我们对整个蛋白质组学数据库进行了深入分析。我们分析了PeptideAtlas蛋白质组学数据库人类版本中包含的蛋白质、肽段和质谱图，以识别GENCODE参考基因集中尚未注释的编码区域。我们发现了数百种缺失的可变蛋白质异构体和未注释的上游翻译的证据，以及来自其他物种的交叉污染迹象。在PeptideAtlas中，有可靠的肽段证据支持34个新的未注释开放阅读框（ORF）。我们发现，几乎一半的开放阅读框属于GENCODE和其他参考集中缺失的编码基因。然而，其余的开放阅读框大多在人类之外并不保守，其肽段确认仅限于癌细胞系。我们表明，这是异常翻译的有力证据，引发了关于异常翻译程度以及这些开放阅读框应如何在参考基因组中注释的重要问题。

Suppr 超能文献

文献检索

文件翻译

深度研究

Suppr 超能文献

文献检索

文件翻译

深度研究

对肽图谱数据库的深入审查发现了未注释编码基因和异常翻译的证据。

A deep audit of the PeptideAtlas database uncovers evidence for unannotated coding genes and aberrant translation.

作者信息

机构信息

出版信息

相似文献

本文引用的文献

对肽图谱数据库的深入审查发现了未注释编码基因和异常翻译的证据。

A deep audit of the PeptideAtlas database uncovers evidence for unannotated coding genes and aberrant translation.

作者信息

机构信息

出版信息

相似文献

本文引用的文献