通过整合小鼠肝脏的转录组和蛋白质组分析发现新基因和基因异构体。

Discovery of novel genes and gene isoforms by integrating transcriptomic and proteomic profiling from mouse liver.

作者信息

Wu Peng, Zhang Hongyu, Lin Weiran, Hao Yunwei, Ren Liangliang, Zhang Chengpu, Li Ning, Wei Handong, Jiang Ying, He Fuchu

机构信息

State Key Laboratory of Proteomics, Beijing Proteome Research Center, Beijing Institute of Radiation Medicine , 33 Life Science Park Road, Beijing 102206, China.

出版信息

J Proteome Res. 2014 May 2;13(5):2409-19. doi: 10.1021/pr4012206. Epub 2014 Apr 18.

DOI:10.1021/pr4012206

PMID:24717071

Abstract

Comprehensively identifying gene expression in both transcriptomic and proteomic levels of one tissue is a prerequisite for a deeper understanding of its biological functions. Alternative splicing and RNA editing, two main forms of transcriptional processing, play important roles in transcriptome and proteome diversity and result in multiple isoforms for one gene, which are hard to identify by mass spectrometry (MS)-based proteomics approach due to the relative lack of isoform information in standard protein databases. In our study, we employed MS and RNA-Seq in parallel into mouse liver tissue and captured a considerable catalogue of both transcripts and proteins that, respectively, covered 60 and 34% of protein-coding genes in Ensembl. We then developed a bioinformatics workflow for building a customized protein database that for the first time included new splicing-derived peptides and RNA-editing-caused peptide variants, allowing us to more completely identify protein isoforms. Using this experimentally determined database, we totally identified 150 peptides not present in standard biological databases at false discovery rate of <1%, corresponding to 72 novel splicing isoforms, 43 new genetic regions, and 15 RNA-editing sites. Of these, 11 randomly selected novel events passed experimental verification by PCR and Sanger sequencing. New discoveries of gene products with high confidence in two omics levels demonstrated the robustness and effectiveness of our approach and its potential application into improve genome annotation. All the MS data have been deposited to the iProx ( http://ww.iprox.org ) with the identifier IPX00003601.

摘要

全面识别一个组织在转录组和蛋白质组水平上的基因表达，是深入了解其生物学功能的先决条件。可变剪接和RNA编辑是转录加工的两种主要形式，在转录组和蛋白质组多样性中发挥重要作用，并导致一个基因产生多种异构体，由于标准蛋白质数据库中异构体信息相对缺乏，基于质谱（MS）的蛋白质组学方法很难识别这些异构体。在我们的研究中，我们将MS和RNA测序并行应用于小鼠肝脏组织，获得了大量的转录本和蛋白质目录，分别覆盖了Ensembl中60%和34%的蛋白质编码基因。然后，我们开发了一种生物信息学工作流程，用于构建一个定制的蛋白质数据库，该数据库首次包含了新的剪接衍生肽和RNA编辑导致的肽变体，使我们能够更全面地识别蛋白质异构体。使用这个通过实验确定的数据库，我们在错误发现率<1%的情况下，总共识别出150种标准生物数据库中不存在的肽，对应72种新的剪接异构体、43个新的基因区域和15个RNA编辑位点。其中，随机选择的11个新事件通过PCR和桑格测序进行了实验验证。在两个组学水平上对基因产物的高可信度新发现，证明了我们方法的稳健性和有效性及其在改进基因组注释方面的潜在应用。所有的MS数据已存入iProx（http://ww.iprox.org），标识符为IPX00003601。

相似文献

Discovery of novel genes and gene isoforms by integrating transcriptomic and proteomic profiling from mouse liver.通过整合小鼠肝脏的转录组和蛋白质组分析发现新基因和基因异构体。

J Proteome Res. 2014 May 2;13(5):2409-19. doi: 10.1021/pr4012206. Epub 2014 Apr 18.

Identification of novel alternative splicing biomarkers for breast cancer with LC/MS/MS and RNA-Seq.利用 LC/MS/MS 和 RNA-Seq 鉴定乳腺癌新型可变剪接生物标志物。

BMC Bioinformatics. 2020 Dec 3;21(Suppl 9):541. doi: 10.1186/s12859-020-03824-8.

Identification of Protein Isoforms Using Reference Databases Built from Long and Short Read RNA-Sequencing.使用基于长读和短读 RNA 测序构建的参考数据库鉴定蛋白质同工型。

J Proteome Res. 2022 Jul 1;21(7):1628-1639. doi: 10.1021/acs.jproteome.1c00968. Epub 2022 May 25.

Identification of a novel protein isoform derived from cancer-related splicing variants using combined analysis of transcriptome and proteome.利用转录组和蛋白质组联合分析鉴定源自癌症相关剪接变异体的新型蛋白质异构体。

Proteomics. 2011 Jun;11(11):2275-82. doi: 10.1002/pmic.201100016. Epub 2011 May 5.

Detection of alternative splice variants at the proteome level in Aspergillus flavus.在黄曲霉中进行蛋白质组水平的可变剪接变体检测。

J Proteome Res. 2010 Mar 5;9(3):1209-17. doi: 10.1021/pr900602d.

Proteomic Validation of Transcript Isoforms, Including Those Assembled from RNA-Seq Data.转录本异构体的蛋白质组学验证，包括从RNA测序数据组装而来的异构体

J Proteome Res. 2015 Sep 4;14(9):3541-54. doi: 10.1021/pr5011394. Epub 2015 May 20.

[Analysis, identification and correction of some errors of model refseqs appeared in NCBI Human Gene Database by in silico cloning and experimental verification of novel human genes].[通过新型人类基因的电子克隆和实验验证对NCBI人类基因数据库中出现的模型参考序列的一些错误进行分析、鉴定和校正]

Yi Chuan Xue Bao. 2004 May;31(5):431-43.

Determining Alternative Protein Isoform Expression Using RNA Sequencing and Mass Spectrometry.使用 RNA 测序和质谱法测定替代蛋白异构体的表达。

STAR Protoc. 2020 Oct 21;1(3):100138. doi: 10.1016/j.xpro.2020.100138. eCollection 2020 Dec 18.

Integrated Transcriptomic-Proteomic Analysis Using a Proteogenomic Workflow Refines Rat Genome Annotation.使用蛋白质基因组学工作流程的综合转录组学-蛋白质组学分析优化大鼠基因组注释。

Mol Cell Proteomics. 2016 Jan;15(1):329-39. doi: 10.1074/mcp.M114.047126. Epub 2015 Nov 11.

SpliceProt: a protein sequence repository of predicted human splice variants.SpliceProt：一个预测的人类剪接变体的蛋白质序列数据库。

Proteomics. 2014 Feb;14(2-3):181-5. doi: 10.1002/pmic.201300078.

引用本文的文献

An urgent call on revisions to current genome annotation strategies.关于修订当前基因组注释策略的紧急呼吁。

Sci China Life Sci. 2023 Aug;66(8):1942-1943. doi: 10.1007/s11427-023-2350-5. Epub 2023 Apr 27.

Large-scale long terminal repeat insertions produced a significant set of novel transcripts in cotton.大规模的长末端重复插入在棉花中产生了一组重要的新转录本。

Sci China Life Sci. 2023 Aug;66(8):1711-1724. doi: 10.1007/s11427-022-2341-8. Epub 2023 Apr 17.

Identification of Novel Genes and Proteoforms in through a Proteogenomic Approach.通过蛋白质基因组学方法鉴定[具体研究对象]中的新基因和蛋白质变体。（你提供的原文中“in”后面缺少具体内容，这里补充了“[具体研究对象]”以使译文更完整）

Pathogens. 2022 Oct 31;11(11):1273. doi: 10.3390/pathogens11111273.

A-to-I RNA Editing Contributes to Proteomic Diversity in Cancer.A-to-I RNA 编辑促进癌症中的蛋白质组多样性。

Cancer Cell. 2018 May 14;33(5):817-828.e7. doi: 10.1016/j.ccell.2018.03.026. Epub 2018 Apr 26.

A comparison of per sample global scaling and per gene normalization methods for differential expression analysis of RNA-seq data.用于RNA测序数据差异表达分析的每个样本全局缩放和每个基因归一化方法的比较。

PLoS One. 2017 May 1;12(5):e0176185. doi: 10.1371/journal.pone.0176185. eCollection 2017.

Improvement of peptide identification with considering the abundance of mRNA and peptide.通过考虑mRNA和肽段的丰度来改进肽段鉴定

BMC Bioinformatics. 2017 Feb 16;18(1):109. doi: 10.1186/s12859-017-1491-5.

Integrating transcriptomic and proteomic data for accurate assembly and annotation of genomes.整合转录组学和蛋白质组学数据以进行基因组的精确组装和注释。

Genome Res. 2017 Jan;27(1):133-144. doi: 10.1101/gr.201368.115. Epub 2016 Nov 15.

Genome, transcriptome and proteome: the rise of omics data and their integration in biomedical sciences.基因组学、转录组学和蛋白质组学：组学数据的兴起及其在生物医学科学中的整合。

Brief Bioinform. 2018 Mar 1;19(2):286-302. doi: 10.1093/bib/bbw114.

PGA: an R/Bioconductor package for identification of novel peptides using a customized database derived from RNA-Seq.PGA：一个用于使用源自RNA测序的定制数据库鉴定新型肽段的R/Bioconductor软件包。

BMC Bioinformatics. 2016 Jun 17;17(1):244. doi: 10.1186/s12859-016-1133-3.

PepPSy: a web server to prioritize gene products in experimental and biocuration workflows.PepPSy：一个在实验和生物编目工作流程中对基因产物进行优先级排序的网络服务器。

Database (Oxford). 2016 May 12;2016. doi: 10.1093/database/baw070. Print 2016.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

通过整合小鼠肝脏的转录组和蛋白质组分析发现新基因和基因异构体。

Discovery of novel genes and gene isoforms by integrating transcriptomic and proteomic profiling from mouse liver.

作者信息

机构信息

出版信息

相似文献

引用本文的文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献

引用本文的文献