Suppr超能文献

基于从头转录组组装的乳腺癌转录组学和蛋白质组学数据的蛋白质基因组分析:新型肽的全基因组鉴定及其临床意义。

Proteogenomic Analysis of Breast Cancer Transcriptomic and Proteomic Data, Using De Novo Transcript Assembly: Genome-Wide Identification of Novel Peptides and Clinical Implications.

机构信息

Mazumdar Shaw Center for Translational Research, Narayana Health, Bangalore, India.

Simulation and Modeling Sciences, Pfizer Pharma GmBH, Berlin, Germany.

出版信息

Mol Cell Proteomics. 2022 Apr;21(4):100220. doi: 10.1016/j.mcpro.2022.100220. Epub 2022 Feb 26.

Abstract

We have carried out proteogenomic analysis of the breast cancer transcriptomic and proteomic data, available at The Clinical Proteomic Tumor Analysis Consortium resource, to identify novel peptides arising from alternatively spliced events as well as other noncanonical expressions. We used a pipeline that consisted of de novo transcript assembly, six frame-translated custom database, and a combination of search engines to identify novel peptides. A portfolio of 4,387 novel peptide sequences initially identified was further screened through PepQuery validation tool (Clinical Proteomic Tumor Analysis Consortium), which yielded 1,558 novel peptides. We considered the dataset of 1,558 validated through PepQuery to understand their functional and clinical significance, leaving the rest to be further verified using other validation tools and approaches. The novel peptides mapped to the known gene sequences as well as to genomic regions yet undefined for translation, 580 novel peptides mapped to known protein-coding genes, 147 to non-protein-coding genes, and 831 belonged to novel translational sequences. The novel peptides belonging to protein-coding genes represented alternatively spliced events or 5' or 3' extensions, whereas others represented translation from pseudogenes, long noncoding RNAs, or novel peptides originating from uncharacterized protein-coding sequences-mostly from the intronic regions of known genes. Seventy-six of the 580 protein-coding genes were associated with cancer hallmark genes, which included key oncogenes, transcription factors, kinases, and cell surface receptors. Survival association analysis of the 76 novel peptide sequences revealed 10 of them to be significant, and we present a panel of six novel peptides, whose high expression was found to be strongly associated with poor survival of patients with human epidermal growth factor receptor 2-enriched subtype. Our analysis represents a landscape of novel peptides of different types that may be expressed in breast cancer tissues, whereas their presence in full-length functional proteins needs further investigations.

摘要

我们对可从临床蛋白质组肿瘤分析联盟资源获得的乳腺癌转录组学和蛋白质组学数据进行了蛋白质基因组分析,以鉴定新的肽,这些肽源自可变剪接事件以及其他非典型表达。我们使用了一个由从头转录组组装、六个框架翻译的定制数据库以及搜索引擎组合组成的管道来鉴定新的肽。最初鉴定的 4387 个新肽序列组合进一步通过 PepQuery 验证工具(临床蛋白质组肿瘤分析联盟)进行筛选,得到 1558 个新肽。我们考虑了通过 PepQuery 验证的数据集,以了解其功能和临床意义,其余部分留待使用其他验证工具和方法进一步验证。新肽映射到已知基因序列以及尚未定义翻译的基因组区域,580 个新肽映射到已知的蛋白编码基因,147 个映射到非蛋白编码基因,831 个属于新的翻译序列。属于蛋白编码基因的新肽代表可变剪接事件或 5' 或 3' 延伸,而其他则代表来自未鉴定的蛋白编码序列的翻译,主要来自已知基因的内含子区域。580 个蛋白编码基因中的 76 个与癌症标志基因相关,其中包括关键的癌基因、转录因子、激酶和细胞表面受体。76 个新肽序列的生存关联分析显示其中 10 个具有显著意义,我们提出了一个由 6 个新肽组成的面板,发现它们的高表达与人类表皮生长因子受体 2 富集亚型患者的不良生存强烈相关。我们的分析代表了不同类型的新肽在乳腺癌组织中可能表达的情况,而它们在全长功能蛋白中的存在需要进一步研究。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9dc4/9020135/3c21060bd1f8/fx1.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验