PepQuery 可实现对新型基因组改变的快速、准确和便捷的蛋白质组学验证。

PepQuery enables fast, accurate, and convenient proteomic validation of novel genomic alterations.

机构信息

Lester and Sue Smith Breast Center, Baylor College of Medicine, Houston, Texas 77030, USA.

Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, Texas 77030, USA.

出版信息

Genome Res. 2019 Mar;29(3):485-493. doi: 10.1101/gr.235028.118. Epub 2019 Jan 4.

DOI:10.1101/gr.235028.118

PMID:30610011

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC6396417/

Abstract

Massively parallel or second-generation sequencing-based genomic studies continuously identify new genomic alterations that may lead to novel protein sequences, which are attractive candidates for disease biomarkers and therapeutic targets after proteomic validation. Integrative proteogenomic methods have been developed to use mass spectrometry (MS)-based proteomics data for such validation. These methods replace the reference sequence database in proteomic database searching with a customized protein database that incorporates sample- or disease-specific sequences derived from DNA or RNA sequencing, thus enabling the identification of novel protein sequences. Although useful, this spectrum-centric approach requires a full evaluation of all possible spectrum-peptide pairs, which is time-consuming, error-prone, and difficult to apply. Here, we present PepQuery, a peptide-centric approach that focuses on only novel DNA or protein sequences of interest. PepQuery allows quick and easy proteomic validation of genomic alterations without customized database construction. We demonstrated the sensitivity and specificity of the approach in validating completely novel proteins, novel splice junctions, and single amino acid variants using simulations and experimental data. Notably, enabling unrestricted modification searching in PepQuery reduced false positives by up to 95%. We implemented PepQuery as both web-based and stand-alone applications. The web version provides direct access to more than half a billion MS/MS spectra from the Clinical Proteomic Tumor Analysis Consortium (CPTAC) and other cancer proteomic studies. The stand-alone version supports batch analysis and user-provided MS/MS data. PepQuery will increase the usage of proteogenomics beyond the proteomics community and will broaden the application of proteogenomics in personalized medicine.

摘要

基于大规模平行或第二代测序的基因组研究不断鉴定出新的基因组改变，这些改变可能导致新的蛋白质序列，这些序列在蛋白质组学验证后成为疾病生物标志物和治疗靶点的有吸引力的候选物。整合的蛋白质基因组学方法已经被开发出来，用于使用基于质谱（MS）的蛋白质组学数据进行这种验证。这些方法用包含来自 DNA 或 RNA 测序的样本或疾病特异性序列的定制蛋白质数据库替代蛋白质组学数据库搜索中的参考序列数据库，从而能够鉴定新的蛋白质序列。虽然有用，但这种基于谱的方法需要对所有可能的谱-肽对进行全面评估，这既耗时、易错，又难以应用。在这里，我们提出了 PepQuery，一种专注于感兴趣的新 DNA 或蛋白质序列的肽基方法。PepQuery 允许在不构建定制数据库的情况下快速轻松地进行蛋白质组学验证基因组改变。我们使用模拟和实验数据证明了该方法在验证完全新的蛋白质、新的剪接连接和单个氨基酸变异方面的灵敏度和特异性。值得注意的是，在 PepQuery 中启用不受限制的修饰搜索可将假阳性减少多达 95%。我们将 PepQuery 实现为基于网络的和独立的应用程序。网络版本提供了对来自临床蛋白质组肿瘤分析联盟（CPTAC）和其他癌症蛋白质组学研究的超过 5 亿个 MS/MS 谱的直接访问。独立版本支持批量分析和用户提供的 MS/MS 数据。PepQuery 将增加蛋白质基因组学在蛋白质组学领域之外的使用，并拓宽蛋白质基因组学在个性化医疗中的应用。

相似文献

PepQuery enables fast, accurate, and convenient proteomic validation of novel genomic alterations.PepQuery 可实现对新型基因组改变的快速、准确和便捷的蛋白质组学验证。

Genome Res. 2019 Mar;29(3):485-493. doi: 10.1101/gr.235028.118. Epub 2019 Jan 4.

Proteogenomic Analysis of Breast Cancer Transcriptomic and Proteomic Data, Using De Novo Transcript Assembly: Genome-Wide Identification of Novel Peptides and Clinical Implications.基于从头转录组组装的乳腺癌转录组学和蛋白质组学数据的蛋白质基因组分析：新型肽的全基因组鉴定及其临床意义。

Mol Cell Proteomics. 2022 Apr;21(4):100220. doi: 10.1016/j.mcpro.2022.100220. Epub 2022 Feb 26.

Using Galaxy-P to leverage RNA-Seq for the discovery of novel protein variations.使用Galaxy-P利用RNA测序来发现新的蛋白质变异。

BMC Genomics. 2014 Aug 22;15(1):703. doi: 10.1186/1471-2164-15-703.

Flexible and accessible workflows for improved proteogenomic analysis using the Galaxy framework.使用Galaxy框架实现灵活且可访问的工作流程，以改进蛋白质基因组分析。

J Proteome Res. 2014 Dec 5;13(12):5898-908. doi: 10.1021/pr500812t. Epub 2014 Oct 23.

A Massive Proteogenomic Screen Identifies Thousands of Novel Peptides From the Human "Dark" Proteome.大规模蛋白质组学筛选鉴定出人“暗蛋白质组”中的数千种新型肽。

Mol Cell Proteomics. 2024 Feb;23(2):100719. doi: 10.1016/j.mcpro.2024.100719. Epub 2024 Jan 17.

Proteogenomics: concepts, applications and computational strategies.蛋白质基因组学：概念、应用及计算策略

Nat Methods. 2014 Nov;11(11):1114-25. doi: 10.1038/nmeth.3144.

Deep coverage of the Escherichia coli proteome enables the assessment of false discovery rates in simple proteogenomic experiments.深度覆盖大肠杆菌蛋白质组可用于评估简单蛋白质基因组实验中的假发现率。

Mol Cell Proteomics. 2013 Nov;12(11):3420-30. doi: 10.1074/mcp.M113.029165. Epub 2013 Aug 1.

Proteome-wide onco-proteogenomic somatic variant identification in ER-positive breast cancer.雌激素受体阳性乳腺癌中全蛋白质组肿瘤蛋白质基因组体细胞变异鉴定

Clin Biochem. 2019 Apr;66:63-75. doi: 10.1016/j.clinbiochem.2019.01.005. Epub 2019 Jan 23.

Identification of new protein coding sequences and signal peptidase cleavage sites of Helicobacter pylori strain 26695 by proteogenomics.通过蛋白质组学鉴定幽门螺杆菌 26695 株的新蛋白编码序列和信号肽切割位点。

J Proteomics. 2013 Jun 28;86:27-42. doi: 10.1016/j.jprot.2013.04.036. Epub 2013 May 9.

Proteogenomic studies on cancer drug resistance: towards biomarker discovery and target identification.癌症耐药性的蛋白质基因组学研究：迈向生物标志物发现与靶点鉴定

Expert Rev Proteomics. 2017 Apr;14(4):351-362. doi: 10.1080/14789450.2017.1299006. Epub 2017 Mar 6.

引用本文的文献

Benchmarking Spectral Library and Database Search Approaches for Metaproteomics Using a Ground-Truth Microbiome Dataset.使用真实微生物组数据集对宏蛋白质组学的光谱库和数据库搜索方法进行基准测试。

bioRxiv. 2025 May 20:2025.05.15.654320. doi: 10.1101/2025.05.15.654320.

Deciphering human endogenous retrovirus expression in colorectal cancers: exploratory analysis regarding prognostic value in liver metastases.解读人类内源性逆转录病毒在结直肠癌中的表达：关于肝转移预后价值的探索性分析。

EBioMedicine. 2025 Jun;116:105727. doi: 10.1016/j.ebiom.2025.105727. Epub 2025 May 16.

Potential shared neoantigens from pan-cancer transcript isoforms.来自泛癌转录本异构体的潜在共享新抗原。

Sci Rep. 2025 May 7;15(1):15886. doi: 10.1038/s41598-025-00817-6.

Quality Control in the Mass Spectrometry Proteomics Core: A Practical Primer.质谱蛋白质组学核心实验室的质量控制：实用入门指南。

J Biomol Tech. 2024 Sep 12;35(3). doi: 10.7171/3fc1f5fe.42308a9a. eCollection 2024 Sep 30.

Open-Source and FAIR Research Software for Proteomics.用于蛋白质组学的开源且符合 FAIR 原则的研究软件。

J Proteome Res. 2025 May 2;24(5):2222-2234. doi: 10.1021/acs.jproteome.4c01079. Epub 2025 Apr 23.

PepCentric Enables Fast Repository-Scale Proteogenomics Searches.PepCentric实现了快速的库规模蛋白质基因组学搜索。

bioRxiv. 2025 Feb 28:2025.02.24.639867. doi: 10.1101/2025.02.24.639867.

Investigating proteogenomic divergence in patient-derived xenograft models of ovarian cancer.研究卵巢癌患者来源异种移植模型中的蛋白质基因组差异。

Sci Rep. 2025 Jan 4;15(1):813. doi: 10.1038/s41598-024-84874-3.

Proteomics Can Rise to the Challenge of Pseudogenes' Coding Nature.蛋白质组学能够应对假基因编码特性带来的挑战。

J Proteome Res. 2024 Dec 6;23(12):5233-5249. doi: 10.1021/acs.jproteome.4c00116. Epub 2024 Nov 1.

Detection of host cell microprotein impurities in antibody drug products.抗体药物制品中宿主细胞微小蛋白杂质的检测。

Nat Commun. 2024 Oct 4;15(1):8605. doi: 10.1038/s41467-024-51870-0.

NCI's Proteomic Data Commons: A Cloud-Based Proteomics Repository Empowering Comprehensive Cancer Analysis through Cross-Referencing with Genomic and Imaging Data.NCI 的蛋白质组学数据联合：一个基于云的蛋白质组学存储库，通过与基因组和成像数据的交叉引用，为全面癌症分析提供支持。

Cancer Res Commun. 2024 Sep 1;4(9):2480-2488. doi: 10.1158/2767-9764.CRC-24-0243.

本文引用的文献

PDV: an integrative proteomics data viewer.PDV：一种综合蛋白质组学数据查看器。

Bioinformatics. 2019 Apr 1;35(7):1249-1251. doi: 10.1093/bioinformatics/bty770.

An IonStar Experimental Strategy for MS1 Ion Current-Based Quantification Using Ultrahigh-Field Orbitrap: Reproducible, In-Depth, and Accurate Protein Measurement in Large Cohorts.一种基于超高场轨道阱的MS1离子电流定量的IonStar实验策略：在大型队列中实现可重复、深入且准确的蛋白质测量。

J Proteome Res. 2017 Jul 7;16(7):2445-2456. doi: 10.1021/acs.jproteome.7b00061. Epub 2017 May 25.

ABRF Proteome Informatics Research Group (iPRG) 2015 Study: Detection of Differentially Abundant Proteins in Label-Free Quantitative LC-MS/MS Experiments.ABRF蛋白质组信息学研究小组（iPRG）2015年研究：在无标记定量液相色谱-串联质谱实验中检测差异丰富的蛋白质。

J Proteome Res. 2017 Feb 3;16(2):945-957. doi: 10.1021/acs.jproteome.6b00881. Epub 2017 Jan 3.

Integrated Proteogenomic Characterization of Human High-Grade Serous Ovarian Cancer.人类高级别浆液性卵巢癌的综合蛋白质基因组特征分析

Cell. 2016 Jul 28;166(3):755-765. doi: 10.1016/j.cell.2016.05.069. Epub 2016 Jun 29.

PGA: an R/Bioconductor package for identification of novel peptides using a customized database derived from RNA-Seq.PGA：一个用于使用源自RNA测序的定制数据库鉴定新型肽段的R/Bioconductor软件包。

BMC Bioinformatics. 2016 Jun 17;17(1):244. doi: 10.1186/s12859-016-1133-3.

Proteogenomics connects somatic mutations to signalling in breast cancer.蛋白质基因组学将体细胞突变与乳腺癌中的信号传导联系起来。

Nature. 2016 Jun 2;534(7605):55-62. doi: 10.1038/nature18003. Epub 2016 May 25.

Peptide-Centric Proteome Analysis: An Alternative Strategy for the Analysis of Tandem Mass Spectrometry Data.以肽段为中心的蛋白质组分析：串联质谱数据分析的替代策略

Mol Cell Proteomics. 2015 Sep;14(9):2301-7. doi: 10.1074/mcp.O114.047035. Epub 2015 Jul 27.

Mass spectrometrists should search only for peptides they care about.质谱分析人员应该只寻找他们关注的肽段。

Nat Methods. 2015 Jul;12(7):605-8. doi: 10.1038/nmeth.3450.

IPeak: An open source tool to combine results from multiple MS/MS search engines.IPeak：一个用于整合多个串联质谱搜索引擎结果的开源工具。

Proteomics. 2015 Sep;15(17):2916-20. doi: 10.1002/pmic.201400208. Epub 2015 Aug 6.

Proteogenomic characterization of human colon and rectal cancer.人类结肠癌和直肠癌的蛋白质基因组学特征分析

Nature. 2014 Sep 18;513(7518):382-7. doi: 10.1038/nature13438. Epub 2014 Jul 20.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

PepQuery 可实现对新型基因组改变的快速、准确和便捷的蛋白质组学验证。

PepQuery enables fast, accurate, and convenient proteomic validation of novel genomic alterations.

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献