Suppr超能文献

PGA:一个用于使用源自RNA测序的定制数据库鉴定新型肽段的R/Bioconductor软件包。

PGA: an R/Bioconductor package for identification of novel peptides using a customized database derived from RNA-Seq.

作者信息

Wen Bo, Xu Shaohang, Zhou Ruo, Zhang Bing, Wang Xiaojing, Liu Xin, Xu Xun, Liu Siqi

机构信息

BGI-Shenzhen, Shenzhen, 518083, China.

Department of Biomedical Informatics, Vanderbilt University School of Medicine, Nashville, TN, 37232, USA.

出版信息

BMC Bioinformatics. 2016 Jun 17;17(1):244. doi: 10.1186/s12859-016-1133-3.

Abstract

BACKGROUND

Peptide identification based upon mass spectrometry (MS) is generally achieved by comparison of the experimental mass spectra with the theoretically digested peptides derived from a reference protein database. Obviously, this strategy could not identify peptide and protein sequences that are absent from a reference database. A customized protein database on the basis of RNA-Seq data is thus proposed to assist with and improve the identification of novel peptides. Correspondingly, development of a comprehensive pipeline, which provides an end-to-end solution for novel peptide detection with the customized protein database, is necessary.

RESULTS

A pipeline with an R package, assigned as a PGA utility, was developed that enables automated treatment to the tandem mass spectrometry (MS/MS) data acquired from different MS platforms and construction of customized protein databases based on RNA-Seq data with or without a reference genome guide. Hence, PGA can identify novel peptides and generate an HTML-based report with a visualized interface. On the basis of a published dataset, PGA was employed to identify peptides, resulting in 636 novel peptides, including 510 single amino acid polymorphism (SAP) peptides, 2 INDEL peptides, 49 splice junction peptides, and 75 novel transcript-derived peptides. The software is freely available from http://bioconductor.org/packages/PGA/ , and the example reports are available at http://wenbostar.github.io/PGA/ .

CONCLUSIONS

The pipeline of PGA, aimed at being platform-independent and easy-to-use, was successfully developed and shown to be capable of identifying novel peptides by searching the customized protein database derived from RNA-Seq data.

摘要

背景

基于质谱(MS)的肽段鉴定通常是通过将实验质谱图与从参考蛋白质数据库中理论上酶解得到的肽段进行比较来实现的。显然,这种策略无法鉴定参考数据库中不存在的肽段和蛋白质序列。因此,提出了一个基于RNA测序数据的定制蛋白质数据库,以辅助并改进新型肽段的鉴定。相应地,开发一个全面的流程来为使用定制蛋白质数据库进行新型肽段检测提供端到端的解决方案是必要的。

结果

开发了一个带有R包的流程,命名为PGA工具,它能够对从不同质谱平台获取的串联质谱(MS/MS)数据进行自动化处理,并基于有无参考基因组指导的RNA测序数据构建定制蛋白质数据库。因此,PGA能够鉴定新型肽段并生成具有可视化界面的基于HTML的报告。基于一个已发表的数据集,使用PGA来鉴定肽段,结果得到636个新型肽段,包括510个单氨基酸多态性(SAP)肽段、2个插入缺失肽段、49个剪接连接肽段和75个新型转录本衍生肽段。该软件可从http://bioconductor.org/packages/PGA/免费获取,示例报告可在http://wenbostar.github.io/PGA/获取。

结论

成功开发了旨在独立于平台且易于使用的PGA流程,并且通过搜索从RNA测序数据衍生的定制蛋白质数据库,该流程能够鉴定新型肽段。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e03a/4912784/44f22e1f4043/12859_2016_1133_Fig1_HTML.jpg

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验