PGA：一个用于使用源自RNA测序的定制数据库鉴定新型肽段的R/Bioconductor软件包。

PGA: an R/Bioconductor package for identification of novel peptides using a customized database derived from RNA-Seq.

作者信息

Wen Bo, Xu Shaohang, Zhou Ruo, Zhang Bing, Wang Xiaojing, Liu Xin, Xu Xun, Liu Siqi

机构信息

BGI-Shenzhen, Shenzhen, 518083, China.

Department of Biomedical Informatics, Vanderbilt University School of Medicine, Nashville, TN, 37232, USA.

出版信息

BMC Bioinformatics. 2016 Jun 17;17(1):244. doi: 10.1186/s12859-016-1133-3.

DOI:10.1186/s12859-016-1133-3

PMID:27316337

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC4912784/

Abstract

BACKGROUND

Peptide identification based upon mass spectrometry (MS) is generally achieved by comparison of the experimental mass spectra with the theoretically digested peptides derived from a reference protein database. Obviously, this strategy could not identify peptide and protein sequences that are absent from a reference database. A customized protein database on the basis of RNA-Seq data is thus proposed to assist with and improve the identification of novel peptides. Correspondingly, development of a comprehensive pipeline, which provides an end-to-end solution for novel peptide detection with the customized protein database, is necessary.

RESULTS

A pipeline with an R package, assigned as a PGA utility, was developed that enables automated treatment to the tandem mass spectrometry (MS/MS) data acquired from different MS platforms and construction of customized protein databases based on RNA-Seq data with or without a reference genome guide. Hence, PGA can identify novel peptides and generate an HTML-based report with a visualized interface. On the basis of a published dataset, PGA was employed to identify peptides, resulting in 636 novel peptides, including 510 single amino acid polymorphism (SAP) peptides, 2 INDEL peptides, 49 splice junction peptides, and 75 novel transcript-derived peptides. The software is freely available from http://bioconductor.org/packages/PGA/ , and the example reports are available at http://wenbostar.github.io/PGA/ .

CONCLUSIONS

The pipeline of PGA, aimed at being platform-independent and easy-to-use, was successfully developed and shown to be capable of identifying novel peptides by searching the customized protein database derived from RNA-Seq data.

摘要

背景

基于质谱（MS）的肽段鉴定通常是通过将实验质谱图与从参考蛋白质数据库中理论上酶解得到的肽段进行比较来实现的。显然，这种策略无法鉴定参考数据库中不存在的肽段和蛋白质序列。因此，提出了一个基于RNA测序数据的定制蛋白质数据库，以辅助并改进新型肽段的鉴定。相应地，开发一个全面的流程来为使用定制蛋白质数据库进行新型肽段检测提供端到端的解决方案是必要的。

结果

开发了一个带有R包的流程，命名为PGA工具，它能够对从不同质谱平台获取的串联质谱（MS/MS）数据进行自动化处理，并基于有无参考基因组指导的RNA测序数据构建定制蛋白质数据库。因此，PGA能够鉴定新型肽段并生成具有可视化界面的基于HTML的报告。基于一个已发表的数据集，使用PGA来鉴定肽段，结果得到636个新型肽段，包括510个单氨基酸多态性（SAP）肽段、2个插入缺失肽段、49个剪接连接肽段和75个新型转录本衍生肽段。该软件可从http://bioconductor.org/packages/PGA/免费获取，示例报告可在http://wenbostar.github.io/PGA/获取。

结论

成功开发了旨在独立于平台且易于使用的PGA流程，并且通过搜索从RNA测序数据衍生的定制蛋白质数据库，该流程能够鉴定新型肽段。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e03a/4912784/44f22e1f4043/12859_2016_1133_Fig1_HTML.jpg

相似文献

PGA: an R/Bioconductor package for identification of novel peptides using a customized database derived from RNA-Seq.PGA：一个用于使用源自RNA测序的定制数据库鉴定新型肽段的R/Bioconductor软件包。

BMC Bioinformatics. 2016 Jun 17;17(1):244. doi: 10.1186/s12859-016-1133-3.

Using Galaxy-P to leverage RNA-Seq for the discovery of novel protein variations.使用Galaxy-P利用RNA测序来发现新的蛋白质变异。

BMC Genomics. 2014 Aug 22;15(1):703. doi: 10.1186/1471-2164-15-703.

customProDB: an R package to generate customized protein databases from RNA-Seq data for proteomics search.customProDB：一个用于从 RNA-Seq 数据生成定制蛋白质数据库的 R 包，用于蛋白质组学搜索。

Bioinformatics. 2013 Dec 15;29(24):3235-7. doi: 10.1093/bioinformatics/btt543. Epub 2013 Sep 20.

Improvement of peptide identification with considering the abundance of mRNA and peptide.通过考虑mRNA和肽段的丰度来改进肽段鉴定

BMC Bioinformatics. 2017 Feb 16;18(1):109. doi: 10.1186/s12859-017-1491-5.

Identification of novel alternative splicing biomarkers for breast cancer with LC/MS/MS and RNA-Seq.利用 LC/MS/MS 和 RNA-Seq 鉴定乳腺癌新型可变剪接生物标志物。

BMC Bioinformatics. 2020 Dec 3;21(Suppl 9):541. doi: 10.1186/s12859-020-03824-8.

Proteomics in non-human primates: utilizing RNA-Seq data to improve protein identification by mass spectrometry in vervet monkeys.非人类灵长类动物的蛋白质组学：利用 RNA-Seq 数据提高食蟹猴中质谱法的蛋白质鉴定水平。

BMC Genomics. 2017 Nov 13;18(1):877. doi: 10.1186/s12864-017-4279-0.

The utility of mass spectrometry-based proteomic data for validation of novel alternative splice forms reconstructed from RNA-Seq data: a preliminary assessment.基于质谱的蛋白质组学数据在验证从 RNA-Seq 数据重建的新型替代剪接形式方面的效用：初步评估。

BMC Bioinformatics. 2010 Dec 14;11 Suppl 11(Suppl 11):S14. doi: 10.1186/1471-2105-11-S11-S14.

sapFinder: an R/Bioconductor package for detection of variant peptides in shotgun proteomics experiments.sapFinder：一种用于在鸟枪法蛋白质组学实验中检测变异肽的 R/Bioconductor 包。

Bioinformatics. 2014 Nov 1;30(21):3136-8. doi: 10.1093/bioinformatics/btu397. Epub 2014 Jul 22.

Discovery and mass spectrometric analysis of novel splice-junction peptides using RNA-Seq.利用 RNA-Seq 发现和质谱分析新型剪接连接肽。

Mol Cell Proteomics. 2013 Aug;12(8):2341-53. doi: 10.1074/mcp.O113.028142. Epub 2013 Apr 29.

PPIP: Automated Software for Identification of Bioactive Endogenous Peptides.PPIP：生物活性内源性肽自动识别软件。

J Proteome Res. 2019 Feb 1;18(2):721-727. doi: 10.1021/acs.jproteome.8b00718. Epub 2018 Dec 26.

引用本文的文献

moPepGen: Rapid and Comprehensive Identification of Non-canonical Peptides.moPepGen：非经典肽段的快速全面鉴定

bioRxiv. 2024 Nov 5:2024.03.28.587261. doi: 10.1101/2024.03.28.587261.

Deep Learning Prediction Boosts Phosphoproteomics-Based Discoveries Through Improved Phosphopeptide Identification.深度学习预测通过改进磷酸肽鉴定来增强基于磷酸蛋白质组学的发现。

Mol Cell Proteomics. 2024 Feb;23(2):100707. doi: 10.1016/j.mcpro.2023.100707. Epub 2023 Dec 26.

Variant biomarker discovery using mass spectrometry-based proteogenomics.使用基于质谱的蛋白质基因组学发现变异生物标志物。

Front Aging. 2023 Apr 24;4:1191993. doi: 10.3389/fragi.2023.1191993. eCollection 2023.

Improving the genome and proteome annotations of the marine model diatom using a proteogenomics strategy.利用蛋白质基因组学策略改进海洋模式硅藻的基因组和蛋白质组注释。

Mar Life Sci Technol. 2023 Feb 3;5(1):102-115. doi: 10.1007/s42995-022-00161-y. eCollection 2023 Feb.

Improving the Genome Annotation of Using Proteogenomics.利用蛋白质基因组学改善[具体物种]的基因组注释（注：原文中“of”后缺少具体内容）

Curr Genomics. 2021 Dec 30;22(5):373-383. doi: 10.2174/1389202922666211011143957.

Proteogenomic Analysis of Breast Cancer Transcriptomic and Proteomic Data, Using De Novo Transcript Assembly: Genome-Wide Identification of Novel Peptides and Clinical Implications.基于从头转录组组装的乳腺癌转录组学和蛋白质组学数据的蛋白质基因组分析：新型肽的全基因组鉴定及其临床意义。

Mol Cell Proteomics. 2022 Apr;21(4):100220. doi: 10.1016/j.mcpro.2022.100220. Epub 2022 Feb 26.

caAtlas: An immunopeptidome atlas of human cancer.癌症图谱：人类癌症免疫肽组图谱

iScience. 2021 Sep 9;24(10):103107. doi: 10.1016/j.isci.2021.103107. eCollection 2021 Oct 22.

Immunopeptidogenomics: Harnessing RNA-Seq to Illuminate the Dark Immunopeptidome.免疫肽组学：利用 RNA-Seq 照亮黑暗免疫肽组。

Mol Cell Proteomics. 2021;20:100143. doi: 10.1016/j.mcpro.2021.100143. Epub 2021 Sep 10.

Prospects and challenges of cancer systems medicine: from genes to disease networks.癌症系统医学的前景与挑战：从基因到疾病网络。

Brief Bioinform. 2022 Jan 17;23(1). doi: 10.1093/bib/bbab343.

Systematic functional interrogation of human pseudogenes using CRISPRi.利用 CRISPRi 系统功能学研究人类假基因。

Genome Biol. 2021 Aug 23;22(1):240. doi: 10.1186/s13059-021-02464-2.

本文引用的文献

Proteomic Validation of Transcript Isoforms, Including Those Assembled from RNA-Seq Data.转录本异构体的蛋白质组学验证，包括从RNA测序数据组装而来的异构体

J Proteome Res. 2015 Sep 4;14(9):3541-54. doi: 10.1021/pr5011394. Epub 2015 May 20.

IPeak: An open source tool to combine results from multiple MS/MS search engines.IPeak：一个用于整合多个串联质谱搜索引擎结果的开源工具。

Proteomics. 2015 Sep;15(17):2916-20. doi: 10.1002/pmic.201400208. Epub 2015 Aug 6.

MS-GF+ makes progress towards a universal database search tool for proteomics.MS-GF+朝着蛋白质组学通用数据库搜索工具的方向取得了进展。

Nat Commun. 2014 Oct 31;5:5277. doi: 10.1038/ncomms6277.

Exome-driven characterization of the cancer cell lines at the proteome level: the NCI-60 case study.蛋白质组水平上癌细胞系的外显子驱动特征分析：NCI-60案例研究

J Proteome Res. 2014 Dec 5;13(12):5551-60. doi: 10.1021/pr500531x. Epub 2014 Oct 21.

Using Galaxy-P to leverage RNA-Seq for the discovery of novel protein variations.使用Galaxy-P利用RNA测序来发现新的蛋白质变异。

BMC Genomics. 2014 Aug 22;15(1):703. doi: 10.1186/1471-2164-15-703.

Bioinformatics. 2014 Nov 1;30(21):3136-8. doi: 10.1093/bioinformatics/btu397. Epub 2014 Jul 22.

Discovery of novel genes and gene isoforms by integrating transcriptomic and proteomic profiling from mouse liver.通过整合小鼠肝脏的转录组和蛋白质组分析发现新基因和基因异构体。

J Proteome Res. 2014 May 2;13(5):2409-19. doi: 10.1021/pr4012206. Epub 2014 Apr 18.

rTANDEM, an R/Bioconductor package for MS/MS protein identification.rTANDEM，一个用于 MS/MS 蛋白质鉴定的 R/Bioconductor 包。

Bioinformatics. 2014 Aug 1;30(15):2233-4. doi: 10.1093/bioinformatics/btu178. Epub 2014 Apr 3.

The OMSSAPercolator: an automated tool to validate OMSSA results.OMSSA 过滤器：一种用于验证 OMSSA 结果的自动化工具。

Proteomics. 2014 May;14(9):1011-4. doi: 10.1002/pmic.201300393. Epub 2014 Mar 12.

Large-scale mass spectrometric detection of variant peptides resulting from nonsynonymous nucleotide differences.由非同义核苷酸差异导致的变异肽段的大规模质谱检测。

J Proteome Res. 2014 Jan 3;13(1):228-40. doi: 10.1021/pr4009207. Epub 2013 Nov 11.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

PGA：一个用于使用源自RNA测序的定制数据库鉴定新型肽段的R/Bioconductor软件包。

PGA: an R/Bioconductor package for identification of novel peptides using a customized database derived from RNA-Seq.

作者信息

机构信息

出版信息

BACKGROUND

RESULTS

CONCLUSIONS

背景

结果

结论

相似文献

引用本文的文献

本文引用的文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献