College of Arts and Sciences, Boise State University, Boise, ID, USA.
BMC Genomics. 2013 Feb 28;14:141. doi: 10.1186/1471-2164-14-141.
Proteogenomic mapping is an approach that uses mass spectrometry data from proteins to directly map protein-coding genes and could aid in locating translational regions in the human genome. In concert with the ENcyclopedia of DNA Elements (ENCODE) project, we applied proteogenomic mapping to produce proteogenomic tracks for the UCSC Genome Browser, to explore which putative translational regions may be missing from the human genome.
We generated ~1 million high-resolution tandem mass (MS/MS) spectra for Tier 1 ENCODE cell lines K562 and GM12878 and mapped them against the UCSC hg19 human genome, and the GENCODE V7 annotated protein and transcript sets. We then compared the results from the three searches to identify the best-matching peptide for each MS/MS spectrum, thereby increasing the confidence of the putative new protein-coding regions found via the whole genome search. At a 1% false discovery rate, we identified 26,472, 24,406, and 13,128 peptides from the protein, transcript, and whole genome searches, respectively; of these, 481 were found solely via the whole genome search. The proteogenomic mapping data are available on the UCSC Genome Browser at http://genome.ucsc.edu/cgi-bin/hgTrackUi?db=hg19&g=wgEncodeUncBsuProt.
The whole genome search revealed that ~4% of the uniquely mapping identified peptides were located outside GENCODE V7 annotated exons. The comparison of the results from the disparate searches also identified 15% more spectra than would have been found solely from a protein database search. Therefore, whole genome proteogenomic mapping is a complementary method for genome annotation when performed in conjunction with other searches.
蛋白质基因组图谱绘制是一种利用蛋白质的质谱数据直接绘制蛋白质编码基因的方法,有助于定位人类基因组中的翻译区。与 ENcyclopedia of DNA Elements(ENCODE)项目合作,我们应用蛋白质基因组图谱绘制方法为 UCSC 基因组浏览器生成蛋白质基因组图谱绘制轨道,以探索人类基因组中可能缺失的哪些假定翻译区。
我们针对 Tier 1 ENCODE 细胞系 K562 和 GM12878 生成了约 100 万个高分辨率串联质谱(MS/MS)谱,并将其与 UCSC hg19 人类基因组、GENCODE V7 注释的蛋白质和转录组进行比对。然后,我们比较了这三个搜索的结果,以确定每个 MS/MS 谱的最佳匹配肽,从而提高通过全基因组搜索发现的假定新蛋白质编码区域的置信度。在 1%的假发现率下,我们分别从蛋白质、转录本和全基因组搜索中鉴定出 26472、24406 和 13128 个肽;其中 481 个仅通过全基因组搜索发现。蛋白质基因组图谱绘制数据可在 UCSC 基因组浏览器上获得,网址为 http://genome.ucsc.edu/cgi-bin/hgTrackUi?db=hg19&g=wgEncodeUncBsuProt。
全基因组搜索表明,约 4%的唯一映射鉴定肽位于 GENCODE V7 注释外显子之外。不同搜索结果的比较还鉴定出了比仅从蛋白质数据库搜索中发现的多 15%的谱。因此,当与其他搜索结合使用时,全基因组蛋白质基因组图谱绘制是基因组注释的一种补充方法。