Suppr超能文献

蛋白质基因组图谱绘制作为一种用于进行基因组注释的补充方法。

Proteogenomic mapping as a complementary method to perform genome annotation.

作者信息

Jaffe Jacob D, Berg Howard C, Church George M

机构信息

Harvard Medical School Department of Genetics, Boston, MA 02115, USA.

出版信息

Proteomics. 2004 Jan;4(1):59-77. doi: 10.1002/pmic.200300511.

Abstract

The accelerated rate of genomic sequencing has led to an abundance of completely sequenced genomes. Annotation of the open reading frames (ORFs) (i.e., gene prediction) in these genomes is an important task and is most often performed computationally based on features in the nucleic acid sequence. Using recent advances in proteomics, we set out to predict the set of ORFs for an organism based principally on expressed protein-based evidence. Using a novel search strategy, we mapped peptides detected in a whole-cell lysate of Mycoplasma pneumoniae onto a genomic scaffold and extended these "hits" into ORFs bound by traditional genetic signals to generate a "proteogenomic map". We were able to generate an ORF model for M. pneumoniae strain FH using proteomic data with a high correlation to models based on sequence features. Ultimately, we detected over 81% of the genomically predicted ORFs in M. pneumoniae strain M129 (the originally sequenced strain). We were also able to detect several new ORFs not originally predicted by genomic methods, various N-terminal extensions, and some evidence that would suggest that certain predicted ORFs are bogus. Some of these differences may be a result of the strain analyzed but demonstrate the robustness of protein analysis across closely related genomes. This technique is a cost-effective means to add value to genome annotation, and a prerequisite for proteome quantitation and in vivo interaction measures.

摘要

基因组测序速度的加快已产生了大量完全测序的基因组。对这些基因组中的开放阅读框(ORF)进行注释(即基因预测)是一项重要任务,并且通常基于核酸序列中的特征通过计算来完成。利用蛋白质组学的最新进展,我们主要基于基于表达蛋白的证据来预测生物体的ORF集。我们采用一种新颖的搜索策略,将在肺炎支原体全细胞裂解物中检测到的肽段映射到基因组支架上,并将这些“匹配结果”扩展为受传统遗传信号界定的ORF,以生成一张“蛋白质基因组图谱”。我们能够利用蛋白质组学数据为肺炎支原体FH菌株生成一个ORF模型,该模型与基于序列特征的模型具有高度相关性。最终,我们在肺炎支原体M129菌株(最初测序的菌株)中检测到了超过81%的基因组预测ORF。我们还能够检测到一些基因组方法最初未预测到的新ORF、各种N端延伸,以及一些表明某些预测ORF是假阳性的证据。其中一些差异可能是由于所分析的菌株不同,但证明了蛋白质分析在密切相关基因组中的稳健性。这项技术是为基因组注释增加价值的一种经济有效的手段,也是蛋白质组定量和体内相互作用测量的前提条件。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验