对 ENCODE 细胞系数据进行全人类基因组蛋白质基因组映射：鉴定蛋白质编码区域。

Whole human genome proteogenomic mapping for ENCODE cell line data: identifying protein-coding regions.

机构信息

College of Arts and Sciences, Boise State University, Boise, ID, USA.

出版信息

BMC Genomics. 2013 Feb 28;14:141. doi: 10.1186/1471-2164-14-141.

DOI:10.1186/1471-2164-14-141

PMID:23448259

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC3607840/

Abstract

BACKGROUND

Proteogenomic mapping is an approach that uses mass spectrometry data from proteins to directly map protein-coding genes and could aid in locating translational regions in the human genome. In concert with the ENcyclopedia of DNA Elements (ENCODE) project, we applied proteogenomic mapping to produce proteogenomic tracks for the UCSC Genome Browser, to explore which putative translational regions may be missing from the human genome.

RESULTS

We generated ~1 million high-resolution tandem mass (MS/MS) spectra for Tier 1 ENCODE cell lines K562 and GM12878 and mapped them against the UCSC hg19 human genome, and the GENCODE V7 annotated protein and transcript sets. We then compared the results from the three searches to identify the best-matching peptide for each MS/MS spectrum, thereby increasing the confidence of the putative new protein-coding regions found via the whole genome search. At a 1% false discovery rate, we identified 26,472, 24,406, and 13,128 peptides from the protein, transcript, and whole genome searches, respectively; of these, 481 were found solely via the whole genome search. The proteogenomic mapping data are available on the UCSC Genome Browser at http://genome.ucsc.edu/cgi-bin/hgTrackUi?db=hg19&g=wgEncodeUncBsuProt.

CONCLUSIONS

The whole genome search revealed that ~4% of the uniquely mapping identified peptides were located outside GENCODE V7 annotated exons. The comparison of the results from the disparate searches also identified 15% more spectra than would have been found solely from a protein database search. Therefore, whole genome proteogenomic mapping is a complementary method for genome annotation when performed in conjunction with other searches.

摘要

背景

蛋白质基因组图谱绘制是一种利用蛋白质的质谱数据直接绘制蛋白质编码基因的方法，有助于定位人类基因组中的翻译区。与 ENcyclopedia of DNA Elements（ENCODE）项目合作，我们应用蛋白质基因组图谱绘制方法为 UCSC 基因组浏览器生成蛋白质基因组图谱绘制轨道，以探索人类基因组中可能缺失的哪些假定翻译区。

结果

我们针对 Tier 1 ENCODE 细胞系 K562 和 GM12878 生成了约 100 万个高分辨率串联质谱（MS/MS）谱，并将其与 UCSC hg19 人类基因组、GENCODE V7 注释的蛋白质和转录组进行比对。然后，我们比较了这三个搜索的结果，以确定每个 MS/MS 谱的最佳匹配肽，从而提高通过全基因组搜索发现的假定新蛋白质编码区域的置信度。在 1%的假发现率下，我们分别从蛋白质、转录本和全基因组搜索中鉴定出 26472、24406 和 13128 个肽；其中 481 个仅通过全基因组搜索发现。蛋白质基因组图谱绘制数据可在 UCSC 基因组浏览器上获得，网址为 http://genome.ucsc.edu/cgi-bin/hgTrackUi?db=hg19&g=wgEncodeUncBsuProt。

结论

全基因组搜索表明，约 4%的唯一映射鉴定肽位于 GENCODE V7 注释外显子之外。不同搜索结果的比较还鉴定出了比仅从蛋白质数据库搜索中发现的多 15%的谱。因此，当与其他搜索结合使用时，全基因组蛋白质基因组图谱绘制是基因组注释的一种补充方法。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/518f/3607840/744670227986/1471-2164-14-141-1.jpg

相似文献

Whole human genome proteogenomic mapping for ENCODE cell line data: identifying protein-coding regions.对 ENCODE 细胞系数据进行全人类基因组蛋白质基因组映射：鉴定蛋白质编码区域。

BMC Genomics. 2013 Feb 28;14:141. doi: 10.1186/1471-2164-14-141.

GENCODE: the reference human genome annotation for The ENCODE Project.GENCODE：ENCODE 项目的人类参考基因组注释。

Genome Res. 2012 Sep;22(9):1760-74. doi: 10.1101/gr.135350.111.

GENCODE: producing a reference annotation for ENCODE.GENCODE：为ENCODE生成参考注释。

Genome Biol. 2006;7 Suppl 1(Suppl 1):S4.1-9. doi: 10.1186/gb-2006-7-s1-s4. Epub 2006 Aug 7.

ENCODE whole-genome data in the UCSC Genome Browser.在 UCSC 基因组浏览器中对全基因组数据进行编码。

Nucleic Acids Res. 2010 Jan;38(Database issue):D620-5. doi: 10.1093/nar/gkp961. Epub 2009 Nov 17.

Proteogenomic mapping of Mycoplasma hyopneumoniae virulent strain 232.猪肺炎支原体强毒株232的蛋白质基因组图谱

BMC Genomics. 2014 Jul 8;15(1):576. doi: 10.1186/1471-2164-15-576.

Long noncoding RNAs are rarely translated in two human cell lines.长非编码 RNA 在两种人类细胞系中很少被翻译。

Genome Res. 2012 Sep;22(9):1646-57. doi: 10.1101/gr.134767.111.

Proteogenomic analysis of Mycobacterium tuberculosis by high resolution mass spectrometry.基于高分辨率质谱的结核分枝杆菌蛋白质组学分析。

Mol Cell Proteomics. 2011 Dec;10(12):M111.011627. doi: 10.1074/mcp.M111.011445. Epub 2011 Oct 3.

Evaluating the effect of database inflation in proteogenomic search on sensitive and reliable peptide identification.评估蛋白质基因组搜索中数据库膨胀对灵敏且可靠的肽段鉴定的影响。

BMC Genomics. 2016 Dec 22;17(Suppl 13):1031. doi: 10.1186/s12864-016-3327-5.

TTS mapping: integrative WEB tool for analysis of triplex formation target DNA sequences, G-quadruplets and non-protein coding regulatory DNA elements in the human genome.TTS 映射：用于分析人类基因组中三聚体形成靶 DNA 序列、G-四联体和非蛋白编码调控 DNA 元件的综合 WEB 工具。

BMC Genomics. 2009 Dec 3;10 Suppl 3(Suppl 3):S9. doi: 10.1186/1471-2164-10-S3-S9.

A proteogenomic analysis of Anopheles gambiae using high-resolution Fourier transform mass spectrometry.使用高分辨率傅里叶变换质谱技术对冈比亚按蚊进行蛋白质基因组分析。

Genome Res. 2011 Nov;21(11):1872-81. doi: 10.1101/gr.127951.111. Epub 2011 Jul 27.

引用本文的文献

A Multi-Faceted Analysis Showing Transcripts and a Recently Confirmed Micropeptide as Important Players in Ovarian Carcinogenesis.一项多方面分析显示转录本和一种最近确认的微肽是卵巢癌发生的重要因素。

Int J Mol Sci. 2024 Apr 16;25(8):4381. doi: 10.3390/ijms25084381.

Efficient Detection of the Alternative Spliced Human Proteome Using Translatome Sequencing.利用翻译组测序高效检测可变剪接的人类蛋白质组

Front Mol Biosci. 2022 Jun 2;9:895746. doi: 10.3389/fmolb.2022.895746. eCollection 2022.

Mapping Microproteins and ncRNA-Encoded Polypeptides in Different Mouse Tissues.不同小鼠组织中微小蛋白质和非编码RNA编码多肽的图谱绘制

Front Cell Dev Biol. 2021 Jul 26;9:687748. doi: 10.3389/fcell.2021.687748. eCollection 2021.

Profiles of alternative splicing events in the diagnosis and prognosis of Gastric Cancer.胃癌诊断和预后中可变剪接事件的概况

J Cancer. 2021 Mar 19;12(10):2982-2992. doi: 10.7150/jca.46239. eCollection 2021.

Emerging role of long noncoding RNA-encoded micropeptides in cancer.长链非编码RNA编码的微小肽在癌症中的新兴作用。

Cancer Cell Int. 2020 Oct 16;20:506. doi: 10.1186/s12935-020-01589-x. eCollection 2020.

A hidden human proteome encoded by 'non-coding' genes.“非编码”基因编码的隐藏人类蛋白质组。

Nucleic Acids Res. 2019 Sep 5;47(15):8111-8125. doi: 10.1093/nar/gkz646.

Translatomics: The Global View of Translation.翻译组学：从全局看翻译。

Int J Mol Sci. 2019 Jan 8;20(1):212. doi: 10.3390/ijms20010212.

ProteomeGenerator: A Framework for Comprehensive Proteomics Based on de Novo Transcriptome Assembly and High-Accuracy Peptide Mass Spectral Matching.蛋白质组生成器：基于从头转录组组装和高精度肽质量谱匹配的综合蛋白质组学框架。

J Proteome Res. 2018 Nov 2;17(11):3681-3692. doi: 10.1021/acs.jproteome.8b00295. Epub 2018 Oct 19.

Improvements to the Rice Genome Annotation Through Large-Scale Analysis of RNA-Seq and Proteomics Data Sets.通过大规模 RNA-Seq 和蛋白质组学数据集分析改进水稻基因组注释。

Mol Cell Proteomics. 2019 Jan;18(1):86-98. doi: 10.1074/mcp.RA118.000832. Epub 2018 Oct 6.

BMC Genomics. 2016 Dec 22;17(Suppl 13):1031. doi: 10.1186/s12864-016-3327-5.

本文引用的文献

An approach to correlate tandem mass spectral data of peptides with amino acid sequences in a protein database.一种将肽的串联质谱数据与蛋白质数据库中氨基酸序列相关联的方法。

J Am Soc Mass Spectrom. 1994 Nov;5(11):976-89. doi: 10.1016/1044-0305(94)80016-2.

GOFAST: an integrated approach for efficient and comprehensive membrane proteome analysis.GOFAST：一种用于高效全面的膜蛋白质组分析的综合方法。

Anal Chem. 2012 Nov 6;84(21):9008-14. doi: 10.1021/ac300134e. Epub 2012 Oct 25.

Long noncoding RNAs are rarely translated in two human cell lines.长非编码 RNA 在两种人类细胞系中很少被翻译。

Genome Res. 2012 Sep;22(9):1646-57. doi: 10.1101/gr.134767.111.

An integrated encyclopedia of DNA elements in the human genome.人类基因组中 DNA 元件的综合百科全书。

Nature. 2012 Sep 6;489(7414):57-74. doi: 10.1038/nature11247.

Proteogenomic analysis of Mycobacterium tuberculosis by high resolution mass spectrometry.基于高分辨率质谱的结核分枝杆菌蛋白质组学分析。

Mol Cell Proteomics. 2011 Dec;10(12):M111.011627. doi: 10.1074/mcp.M111.011445. Epub 2011 Oct 3.

A proteogenomic analysis of Anopheles gambiae using high-resolution Fourier transform mass spectrometry.使用高分辨率傅里叶变换质谱技术对冈比亚按蚊进行蛋白质基因组分析。

Genome Res. 2011 Nov;21(11):1872-81. doi: 10.1101/gr.127951.111. Epub 2011 Jul 27.

A user's guide to the encyclopedia of DNA elements (ENCODE).DNA 元件百科全书（ENCODE）使用指南

PLoS Biol. 2011 Apr;9(4):e1001046. doi: 10.1371/journal.pbio.1001046. Epub 2011 Apr 19.

Shotgun proteomics aids discovery of novel protein-coding genes, alternative splicing, and "resurrected" pseudogenes in the mouse genome. shotgun 蛋白质组学有助于发现新的蛋白质编码基因、可变剪接和小鼠基因组中的“复活”假基因。

Genome Res. 2011 May;21(5):756-67. doi: 10.1101/gr.114272.110. Epub 2011 Apr 1.

Proteogenomics.蛋白质基因组学。

Proteomics. 2011 Feb;11(4):620-30. doi: 10.1002/pmic.201000615. Epub 2011 Jan 18.

Alternative splice variants, a new class of protein cancer biomarker candidates: findings in pancreatic cancer and breast cancer with systems biology implications.替代剪接变异体，一类新的蛋白质癌症生物标志物候选物：在胰腺癌和乳腺癌中的发现及其对系统生物学的影响。

Dis Markers. 2010;28(4):241-51. doi: 10.3233/DMA-2010-0702.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

对 ENCODE 细胞系数据进行全人类基因组蛋白质基因组映射：鉴定蛋白质编码区域。

Whole human genome proteogenomic mapping for ENCODE cell line data: identifying protein-coding regions.

机构信息

出版信息

BACKGROUND

RESULTS

CONCLUSIONS

背景

结果

结论

相似文献

引用本文的文献

本文引用的文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献