• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

使用Galaxy-P利用RNA测序来发现新的蛋白质变异。

Using Galaxy-P to leverage RNA-Seq for the discovery of novel protein variations.

作者信息

Sheynkman Gloria M, Johnson James E, Jagtap Pratik D, Shortreed Michael R, Onsongo Getiria, Frey Brian L, Griffin Timothy J, Smith Lloyd M

机构信息

Chemistry Department, University of Wisconsin-Madison, 1101 University Ave,, Madison, WI 53706, USA.

出版信息

BMC Genomics. 2014 Aug 22;15(1):703. doi: 10.1186/1471-2164-15-703.

DOI:10.1186/1471-2164-15-703
PMID:25149441
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC4158061/
Abstract

BACKGROUND

Current practice in mass spectrometry (MS)-based proteomics is to identify peptides by comparison of experimental mass spectra with theoretical mass spectra derived from a reference protein database; however, this strategy necessarily fails to detect peptide and protein sequences that are absent from the database. We and others have recently shown that customized proteomic databases derived from RNA-Seq data can be employed for MS-searching to both improve MS analysis and identify novel peptides. While this general strategy constitutes a significant advance for the discovery of novel protein variations, it has not been readily transferable to other laboratories due to the need for many specialized software tools. To address this problem, we have implemented readily accessible, modifiable, and extensible workflows within Galaxy-P, short for Galaxy for Proteomics, a web-based bioinformatic extension of the Galaxy framework for the analysis of multi-omics (e.g. genomics, transcriptomics, proteomics) data.

RESULTS

We present three bioinformatic workflows that allow the user to upload raw RNA sequencing reads and convert the data into high-quality customized proteomic databases suitable for MS searching. We show the utility of these workflows on human and mouse samples, identifying 544 peptides containing single amino acid polymorphisms (SAPs) and 187 peptides corresponding to unannotated splice junction peptides, correlating protein and transcript expression levels, and providing the option to incorporate transcript abundance measures within the MS database search process (reduced databases, incorporation of transcript abundance for protein identification score calculations, etc.).

CONCLUSIONS

Using RNA-Seq data to enhance MS analysis is a promising strategy to discover novel peptides specific to a sample and, more generally, to improve proteomics results. The main bottleneck for widespread adoption of this strategy has been the lack of easily used and modifiable computational tools. We provide a solution to this problem by introducing a set of workflows within the Galaxy-P framework that converts raw RNA-Seq data into customized proteomic databases.

摘要

背景

基于质谱(MS)的蛋白质组学的当前做法是通过将实验质谱与从参考蛋白质数据库衍生的理论质谱进行比较来鉴定肽段;然而,这种策略必然无法检测数据库中不存在的肽段和蛋白质序列。我们和其他人最近表明,源自RNA测序数据的定制蛋白质组学数据库可用于质谱搜索,以改善质谱分析并鉴定新的肽段。虽然这种一般策略在发现新的蛋白质变异方面取得了重大进展,但由于需要许多专门的软件工具,它尚未容易地转移到其他实验室。为了解决这个问题,我们在Galaxy-P(蛋白质组学的Galaxy)中实现了易于访问、可修改和可扩展的工作流程,Galaxy-P是Galaxy框架的基于网络的生物信息学扩展,用于分析多组学(如基因组学、转录组学、蛋白质组学)数据。

结果

我们展示了三种生物信息学工作流程,允许用户上传原始RNA测序读数,并将数据转换为适合质谱搜索的高质量定制蛋白质组学数据库。我们展示了这些工作流程在人类和小鼠样本上的效用,鉴定了544个包含单氨基酸多态性(SAPs)的肽段和187个对应于未注释剪接连接肽段的肽段,关联了蛋白质和转录本表达水平,并提供了在质谱数据库搜索过程中纳入转录本丰度测量的选项(简化数据库、纳入转录本丰度以计算蛋白质鉴定分数等)。

结论

使用RNA测序数据增强质谱分析是发现样本特异性新肽段以及更广泛地改善蛋白质组学结果的有前途的策略。广泛采用该策略的主要瓶颈一直是缺乏易于使用和可修改的计算工具。我们通过在Galaxy-P框架内引入一组将原始RNA测序数据转换为定制蛋白质组学数据库的工作流程来解决这个问题。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/683e/4158061/cfa647d34c05/12864_2014_6401_Fig3_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/683e/4158061/5a82411b122f/12864_2014_6401_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/683e/4158061/57e796f145d7/12864_2014_6401_Fig2_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/683e/4158061/cfa647d34c05/12864_2014_6401_Fig3_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/683e/4158061/5a82411b122f/12864_2014_6401_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/683e/4158061/57e796f145d7/12864_2014_6401_Fig2_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/683e/4158061/cfa647d34c05/12864_2014_6401_Fig3_HTML.jpg

相似文献

1
Using Galaxy-P to leverage RNA-Seq for the discovery of novel protein variations.使用Galaxy-P利用RNA测序来发现新的蛋白质变异。
BMC Genomics. 2014 Aug 22;15(1):703. doi: 10.1186/1471-2164-15-703.
2
Discovery and mass spectrometric analysis of novel splice-junction peptides using RNA-Seq.利用 RNA-Seq 发现和质谱分析新型剪接连接肽。
Mol Cell Proteomics. 2013 Aug;12(8):2341-53. doi: 10.1074/mcp.O113.028142. Epub 2013 Apr 29.
3
Multi-omics Visualization Platform: An extensible Galaxy plug-in for multi-omics data visualization and exploration.多组学可视化平台:一个可扩展的 Galaxy 插件,用于多组学数据的可视化和探索。
Gigascience. 2020 Apr 1;9(4). doi: 10.1093/gigascience/giaa025.
4
The utility of mass spectrometry-based proteomic data for validation of novel alternative splice forms reconstructed from RNA-Seq data: a preliminary assessment.基于质谱的蛋白质组学数据在验证从 RNA-Seq 数据重建的新型替代剪接形式方面的效用:初步评估。
BMC Bioinformatics. 2010 Dec 14;11 Suppl 11(Suppl 11):S14. doi: 10.1186/1471-2105-11-S11-S14.
5
PGA: an R/Bioconductor package for identification of novel peptides using a customized database derived from RNA-Seq.PGA:一个用于使用源自RNA测序的定制数据库鉴定新型肽段的R/Bioconductor软件包。
BMC Bioinformatics. 2016 Jun 17;17(1):244. doi: 10.1186/s12859-016-1133-3.
6
Identification of novel alternative splicing biomarkers for breast cancer with LC/MS/MS and RNA-Seq.利用 LC/MS/MS 和 RNA-Seq 鉴定乳腺癌新型可变剪接生物标志物。
BMC Bioinformatics. 2020 Dec 3;21(Suppl 9):541. doi: 10.1186/s12859-020-03824-8.
7
Proteomics in non-human primates: utilizing RNA-Seq data to improve protein identification by mass spectrometry in vervet monkeys.非人类灵长类动物的蛋白质组学:利用 RNA-Seq 数据提高食蟹猴中质谱法的蛋白质鉴定水平。
BMC Genomics. 2017 Nov 13;18(1):877. doi: 10.1186/s12864-017-4279-0.
8
Identification of Protein Isoforms Using Reference Databases Built from Long and Short Read RNA-Sequencing.使用基于长读和短读 RNA 测序构建的参考数据库鉴定蛋白质同工型。
J Proteome Res. 2022 Jul 1;21(7):1628-1639. doi: 10.1021/acs.jproteome.1c00968. Epub 2022 May 25.
9
Human Proteomic Variation Revealed by Combining RNA-Seq Proteogenomics and Global Post-Translational Modification (G-PTM) Search Strategy.通过整合RNA测序蛋白质基因组学和全球翻译后修饰(G-PTM)搜索策略揭示的人类蛋白质组变异
J Proteome Res. 2016 Mar 4;15(3):800-8. doi: 10.1021/acs.jproteome.5b00817. Epub 2016 Jan 12.
10
Proteomic Validation of Transcript Isoforms, Including Those Assembled from RNA-Seq Data.转录本异构体的蛋白质组学验证,包括从RNA测序数据组装而来的异构体
J Proteome Res. 2015 Sep 4;14(9):3541-54. doi: 10.1021/pr5011394. Epub 2015 May 20.

引用本文的文献

1
Chemoproteogenomic stratification of the missense variant cysteinome.错义变异半胱氨酸组的化学蛋白质基因组分层分析。
Nat Commun. 2024 Oct 28;15(1):9284. doi: 10.1038/s41467-024-53520-x.
2
Discovering Novel Proteoforms Using Proteogenomic Workflows Within the Galaxy Bioinformatics Platform.利用 Galaxy 生物信息学平台中的蛋白质基因组工作流程发现新型蛋白异构体。
Methods Mol Biol. 2025;2859:109-128. doi: 10.1007/978-1-0716-4152-1_7.
3
Introduction to Integrated Proteogenomic Pipeline for Dealing with Pathogenic Missense SNPs.介绍用于处理致病错义 SNP 的集成蛋白基因组学管道。

本文引用的文献

1
An approach to correlate tandem mass spectral data of peptides with amino acid sequences in a protein database.一种将肽的串联质谱数据与蛋白质数据库中氨基酸序列相关联的方法。
J Am Soc Mass Spectrom. 1994 Nov;5(11):976-89. doi: 10.1016/1044-0305(94)80016-2.
2
Large-scale mass spectrometric detection of variant peptides resulting from nonsynonymous nucleotide differences.由非同义核苷酸差异导致的变异肽段的大规模质谱检测。
J Proteome Res. 2014 Jan 3;13(1):228-40. doi: 10.1021/pr4009207. Epub 2013 Nov 11.
3
customProDB: an R package to generate customized protein databases from RNA-Seq data for proteomics search.
Methods Mol Biol. 2025;2859:93-107. doi: 10.1007/978-1-0716-4152-1_6.
4
moPepGen: Rapid and Comprehensive Identification of Non-canonical Peptides.moPepGen:非经典肽段的快速全面鉴定
bioRxiv. 2024 Nov 5:2024.03.28.587261. doi: 10.1101/2024.03.28.587261.
5
Proteomics as a Metrological Tool to Evaluate Genome Annotation Accuracy Following De Novo Genome Assembly: A Case Study Using the Atlantic Bottlenose Dolphin ().蛋白质组学作为一种计量工具,用于评估从头组装基因组后基因组注释的准确性:以大西洋宽吻海豚()为例的研究
Genes (Basel). 2023 Aug 25;14(9):1696. doi: 10.3390/genes14091696.
6
Multi-omic stratification of the missense variant cysteinome.错义变异半胱氨酸组的多组学分层
bioRxiv. 2023 Aug 14:2023.08.12.553095. doi: 10.1101/2023.08.12.553095.
7
Variant biomarker discovery using mass spectrometry-based proteogenomics.使用基于质谱的蛋白质基因组学发现变异生物标志物。
Front Aging. 2023 Apr 24;4:1191993. doi: 10.3389/fragi.2023.1191993. eCollection 2023.
8
OpenCustomDB: Integration of Unannotated Open Reading Frames and Genetic Variants to Generate More Comprehensive Customized Protein Databases.OpenCustomDB:整合未注释的开放阅读框和遗传变异以生成更全面的定制蛋白质数据库。
J Proteome Res. 2023 May 5;22(5):1492-1500. doi: 10.1021/acs.jproteome.3c00054. Epub 2023 Mar 24.
9
Multi-omics approach to identifying isoform variants as therapeutic targets in cancer patients.用于识别异构体变体作为癌症患者治疗靶点的多组学方法。
Front Oncol. 2022 Nov 24;12:1051487. doi: 10.3389/fonc.2022.1051487. eCollection 2022.
10
AnnotaPipeline: An integrated tool to annotate eukaryotic proteins using multi-omics data.AnnotaPipeline:一种利用多组学数据注释真核生物蛋白质的综合工具。
Front Genet. 2022 Nov 22;13:1020100. doi: 10.3389/fgene.2022.1020100. eCollection 2022.
customProDB:一个用于从 RNA-Seq 数据生成定制蛋白质数据库的 R 包,用于蛋白质组学搜索。
Bioinformatics. 2013 Dec 15;29(24):3235-7. doi: 10.1093/bioinformatics/btt543. Epub 2013 Sep 20.
4
Proteogenomic database construction driven from large scale RNA-seq data.基于大规模RNA测序数据驱动的蛋白质基因组数据库构建。
J Proteome Res. 2014 Jan 3;13(1):21-8. doi: 10.1021/pr400294c. Epub 2013 Jul 17.
5
Discovery and mass spectrometric analysis of novel splice-junction peptides using RNA-Seq.利用 RNA-Seq 发现和质谱分析新型剪接连接肽。
Mol Cell Proteomics. 2013 Aug;12(8):2341-53. doi: 10.1074/mcp.O113.028142. Epub 2013 Apr 29.
6
Deep proteome coverage based on ribosome profiling aids mass spectrometry-based protein and peptide discovery and provides evidence of alternative translation products and near-cognate translation initiation events.基于核糖体分析的深度蛋白质组覆盖可辅助基于质谱的蛋白质和肽发现,并提供替代翻译产物和近同源翻译起始事件的证据。
Mol Cell Proteomics. 2013 Jul;12(7):1780-90. doi: 10.1074/mcp.M113.027540. Epub 2013 Feb 21.
7
A two-step database search method improves sensitivity in peptide sequence matches for metaproteomics and proteogenomics studies.两步式数据库搜索方法提高了代谢组学和蛋白质组学研究中肽序列匹配的灵敏度。
Proteomics. 2013 Apr;13(8):1352-7. doi: 10.1002/pmic.201200352. Epub 2013 Mar 15.
8
De novo derivation of proteomes from transcriptomes for transcript and protein identification.从头构建蛋白质组学以从转录组中获取转录本和蛋白质鉴定。
Nat Methods. 2012 Dec;9(12):1207-11. doi: 10.1038/nmeth.2227. Epub 2012 Nov 11.
9
An integrated map of genetic variation from 1,092 human genomes.1092 个人类基因组遗传变异的综合图谱。
Nature. 2012 Nov 1;491(7422):56-65. doi: 10.1038/nature11632.
10
Addressing statistical biases in nucleotide-derived protein databases for proteogenomic search strategies.解决基于核苷酸的蛋白质数据库中的统计偏差,以用于蛋白质基因组搜索策略。
J Proteome Res. 2012 Nov 2;11(11):5221-34. doi: 10.1021/pr300411q. Epub 2012 Oct 15.