Suppr超能文献

在蛋白质组学中使用全基因组开放阅读框分析进行新型基因和基因模型检测。

Novel gene and gene model detection using a whole genome open reading frame analysis in proteomics.

作者信息

Fermin Damian, Allen Baxter B, Blackwell Thomas W, Menon Rajasree, Adamski Marcin, Xu Yin, Ulintz Peter, Omenn Gilbert S, States David J

机构信息

Bioinformatics Program, University of Michigan, Ann Arbor, MI 48109, USA.

出版信息

Genome Biol. 2006;7(4):R35. doi: 10.1186/gb-2006-7-4-r35. Epub 2006 Apr 28.

Abstract

BACKGROUND

Defining the location of genes and the precise nature of gene products remains a fundamental challenge in genome annotation. Interrogating tandem mass spectrometry data using genomic sequence provides an unbiased method to identify novel translation products. A six-frame translation of the entire human genome was used as the query database to search for novel blood proteins in the data from the Human Proteome Organization Plasma Proteome Project. Because this target database is orders of magnitude larger than the databases traditionally employed in tandem mass spectra analysis, careful attention to significance testing is required. Confidence of identification is assessed using our previously described Poisson statistic, which estimates the significance of multi-peptide identifications incorporating the length of the matching sequence, number of spectra searched and size of the target sequence database.

RESULTS

Applying a false discovery rate threshold of 0.05, we identified 282 significant open reading frames, each containing two or more peptide matches. There were 627 novel peptides associated with these open reading frames that mapped to a unique genomic coordinate placed within the start/stop points of previously annotated genes. These peptides matched 1,110 distinct tandem MS spectra. Peptides fell into four categories based upon where their genomic coordinates placed them relative to annotated exons within the parent gene.

CONCLUSION

This work provides evidence for novel alternative splice variants in many previously annotated genes. These findings suggest that annotation of the genome is not yet complete and that proteomics has the potential to further add to our understanding of gene structures.

摘要

背景

确定基因的位置以及基因产物的确切性质仍然是基因组注释中的一项基本挑战。利用基因组序列查询串联质谱数据提供了一种无偏差的方法来识别新的翻译产物。使用整个人类基因组的六框架翻译作为查询数据库,在人类蛋白质组组织血浆蛋白质组计划的数据中搜索新的血液蛋白质。由于这个目标数据库比传统用于串联质谱分析的数据库大几个数量级,因此需要仔细关注显著性检验。使用我们之前描述的泊松统计量评估鉴定的可信度,该统计量估计了结合匹配序列长度、搜索的谱图数量和目标序列数据库大小的多肽鉴定的显著性。

结果

应用错误发现率阈值0.05,我们鉴定出282个显著的开放阅读框,每个开放阅读框包含两个或更多的肽段匹配。有627个与这些开放阅读框相关的新肽段,它们映射到位于先前注释基因的起始/终止点内的唯一基因组坐标。这些肽段匹配了1110个不同的串联质谱谱图。根据其基因组坐标相对于母基因中注释外显子的位置,肽段分为四类。

结论

这项工作为许多先前注释基因中的新型可变剪接变体提供了证据。这些发现表明基因组注释尚未完成,蛋白质组学有可能进一步增进我们对基因结构的理解。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a439/1557991/7e8d7fb02707/gb-2006-7-4-r35-1.jpg

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验