一种整合的质谱分析管道在人类基因组中鉴定出新型蛋白质编码区域。

An integrated mass-spectrometry pipeline identifies novel protein coding-regions in the human genome.

机构信息

Applied Computational Biology and Bioinformatics Group, Cancer Research UK, Paterson Institute for Cancer Research, The University of Manchester, Manchester, United Kingdom.

出版信息

PLoS One. 2010 Jan 28;5(1):e8949. doi: 10.1371/journal.pone.0008949.

DOI:10.1371/journal.pone.0008949

PMID:20126623

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC2812506/

Abstract

BACKGROUND

Most protein mass spectrometry (MS) experiments rely on searches against a database of known or predicted proteins, limiting their ability as a gene discovery tool.

RESULTS

Using a search against an in silico translation of the entire human genome, combined with a series of annotation filters, we identified 346 putative novel peptides [False Discovery Rate (FDR)<5%] in a MS dataset derived from two human breast epithelial cell lines. A subset of these were then successfully validated by a different MS technique. Two of these correspond to novel isoforms of Heterogeneous Ribonuclear Proteins, while the rest correspond to novel loci.

CONCLUSIONS

MS technology can be used for ab initio gene discovery in human data, which, since it is based on different underlying assumptions, identifies protein-coding genes not found by other techniques. As MS technology continues to evolve, such approaches will become increasingly powerful.

摘要

背景

大多数蛋白质质谱（MS）实验依赖于针对已知或预测蛋白质数据库的搜索，这限制了它们作为基因发现工具的能力。

结果

我们使用针对整个人类基因组的计算机翻译进行搜索，并结合一系列注释筛选，在源自两种人类乳腺上皮细胞系的 MS 数据集中共鉴定出 346 种假定的新型肽[假发现率（FDR）<5%]。然后，通过另一种 MS 技术成功验证了其中的一部分。其中两个对应于异质核糖核蛋白的新型同工型，而其余的则对应于新的基因座。

结论

MS 技术可用于从头开始在人类数据中进行基因发现，由于它基于不同的基本假设，因此可以识别其他技术未发现的编码蛋白质的基因。随着 MS 技术的不断发展，这种方法将变得越来越强大。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/fae8/2812506/e694ccd79688/pone.0008949.g001.jpg

相似文献

An integrated mass-spectrometry pipeline identifies novel protein coding-regions in the human genome.

PLoS One. 2010 Jan 28;5(1):e8949. doi: 10.1371/journal.pone.0008949.

Integration of mass spectrometry and RNA-Seq data to confirm human ab initio predicted genes and lncRNAs.

Proteomics. 2014 Dec;14(23-24):2760-8. doi: 10.1002/pmic.201400174.

[Analysis, identification and correction of some errors of model refseqs appeared in NCBI Human Gene Database by in silico cloning and experimental verification of novel human genes].

Yi Chuan Xue Bao. 2004 May;31(5):431-43.

Genome annotation of Anopheles gambiae using mass spectrometry-derived data.

BMC Genomics. 2005 Sep 19;6:128. doi: 10.1186/1471-2164-6-128.

Integrated Proteomic Pipeline Using Multiple Search Engines for a Proteogenomic Study with a Controlled Protein False Discovery Rate.

J Proteome Res. 2016 Nov 4;15(11):4082-4090. doi: 10.1021/acs.jproteome.6b00376. Epub 2016 Aug 30.

Interrogating the human genome using uninterpreted mass spectrometry data.

Proteomics. 2001 May;1(5):651-67. doi: 10.1002/1615-9861(200104)1:5<651::AID-PROT651>3.0.CO;2-N.

Whole genome searching with shotgun proteomic data: applications for genome annotation.

J Proteome Res. 2008 Jan;7(1):80-8. doi: 10.1021/pr070198n. Epub 2007 Dec 7.

Comprehensive mass spectrometric analysis of the 20S proteasome complex.

Methods Enzymol. 2005;405:187-236. doi: 10.1016/S0076-6879(05)05009-3.

Discovery of coding regions in the human genome by integrated proteogenomics analysis workflow.

Nat Commun. 2018 Mar 2;9(1):903. doi: 10.1038/s41467-018-03311-y.

Integrated Transcriptomic-Proteomic Analysis Using a Proteogenomic Workflow Refines Rat Genome Annotation.

Mol Cell Proteomics. 2016 Jan;15(1):329-39. doi: 10.1074/mcp.M114.047126. Epub 2015 Nov 11.

引用本文的文献

In silico prediction of housekeeping long intergenic non-coding RNAs reveals HKlincR1 as an essential player in lung cancer cell survival.

Sci Rep. 2019 May 14;9(1):7372. doi: 10.1038/s41598-019-43758-7.

Proteogenomics: Integrating Next-Generation Sequencing and Mass Spectrometry to Characterize Human Proteomic Variation.

Annu Rev Anal Chem (Palo Alto Calif). 2016 Jun 12;9(1):521-45. doi: 10.1146/annurev-anchem-071015-041722. Epub 2016 Mar 30.

The bacterial proteogenomic pipeline.

BMC Genomics. 2014;15 Suppl 9(Suppl 9):S19. doi: 10.1186/1471-2164-15-S9-S19. Epub 2014 Dec 8.

A global non-coding RNA system modulates fission yeast protein levels in response to stress.

Nat Commun. 2014 May 23;5:3947. doi: 10.1038/ncomms4947.

Integrating genomic, transcriptomic, and interactome data to improve Peptide and protein identification in shotgun proteomics.

J Proteome Res. 2014 Jun 6;13(6):2715-23. doi: 10.1021/pr500194t. Epub 2014 May 12.

On the extent and role of the small proteome in the parasitic eukaryote Trypanosoma brucei.

BMC Biol. 2014 Feb 19;12:14. doi: 10.1186/1741-7007-12-14.

Deducing protein function by forensic integrative cell biology.

PLoS Biol. 2013 Dec;11(12):e1001742. doi: 10.1371/journal.pbio.1001742. Epub 2013 Dec 17.

HiRIEF LC-MS enables deep proteome coverage and unbiased proteogenomics.

Nat Methods. 2014 Jan;11(1):59-62. doi: 10.1038/nmeth.2732. Epub 2013 Nov 17.

Deep coverage of the Escherichia coli proteome enables the assessment of false discovery rates in simple proteogenomic experiments.

Mol Cell Proteomics. 2013 Nov;12(11):3420-30. doi: 10.1074/mcp.M113.029165. Epub 2013 Aug 1.

Discovery and mass spectrometric analysis of novel splice-junction peptides using RNA-Seq.

Mol Cell Proteomics. 2013 Aug;12(8):2341-53. doi: 10.1074/mcp.O113.028142. Epub 2013 Apr 29.

本文引用的文献

An approach to correlate tandem mass spectral data of peptides with amino acid sequences in a protein database.

J Am Soc Mass Spectrom. 1994 Nov;5(11):976-89. doi: 10.1016/1044-0305(94)80016-2.

Improving sensitivity in proteome studies by analysis of false discovery rates for multiple search engines.

Proteomics. 2009 Mar;9(5):1220-9. doi: 10.1002/pmic.200800473.

Identification of novel alternative splice isoforms of circulating proteins in a mouse model of human pancreatic cancer.

Cancer Res. 2009 Jan 1;69(1):300-9. doi: 10.1158/0008-5472.CAN-08-2145.

Discovery and revision of Arabidopsis genes by proteogenomics.

Proc Natl Acad Sci U S A. 2008 Dec 30;105(52):21034-8. doi: 10.1073/pnas.0811066106. Epub 2008 Dec 19.

Differentiating protein-coding and noncoding RNA: challenges and ambiguities.

PLoS Comput Biol. 2008 Nov;4(11):e1000176. doi: 10.1371/journal.pcbi.1000176. Epub 2008 Nov 28.

RNA-Seq: a revolutionary tool for transcriptomics.

Nat Rev Genet. 2009 Jan;10(1):57-63. doi: 10.1038/nrg2484.

Applications of next-generation sequencing technologies in functional genomics.

Genomics. 2008 Nov;92(5):255-64. doi: 10.1016/j.ygeno.2008.07.001. Epub 2008 Aug 24.

Nonlinear fitting method for determining local false discovery rates from decoy database searches.

J Proteome Res. 2008 Sep;7(9):3661-7. doi: 10.1021/pr070492f. Epub 2008 Aug 14.

The transcriptional landscape of the yeast genome defined by RNA sequencing.

Science. 2008 Jun 6;320(5881):1344-9. doi: 10.1126/science.1158441. Epub 2008 May 1.

Déjà vu in proteomics. A hit parade of repeatedly identified differentially expressed proteins.

Proteomics. 2008 May;8(9):1744-9. doi: 10.1002/pmic.200700919.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

Suppr
超能文献

一种整合的质谱分析管道在人类基因组中鉴定出新型蛋白质编码区域。

An integrated mass-spectrometry pipeline identifies novel protein coding-regions in the human genome.

机构信息