School of Biotechnology and Biomolecular Sciences, The University of New South Wales, Sydney, New South Wales 2052, Australia.
Australian e-Health Research Centre, Commonwealth Scientific and Industrial Research Organisation, Sydney, New South Wales 2113, Australia.
J Proteome Res. 2022 Jul 1;21(7):1628-1639. doi: 10.1021/acs.jproteome.1c00968. Epub 2022 May 25.
Alternative splicing can lead to distinct protein isoforms. These can have different functions in specific cells and tissues or in different developmental stages. In this study, we explored whether transcripts assembled from long read, nanopore-based, direct RNA-sequencing (RNA-seq) could improve the identification of protein isoforms in human K562 cells. By comparing with Illumina-based short read RNA-seq, we showed that a large proportion of Ensembl transcripts (5949/14,326) and genes expressing alternatively spliced transcripts (486/2981) identified with long direct reads were missed by short paired-end reads. By co-analyzing proteomic and transcriptomic data, we also showed that some peptides (826/35,976), proteins (262/3215), and protein isoforms arising from distinct transcript variants (574/1212) identified with isoform-specific peptides via custom long-read-based databases were missed in Illumina-derived databases. Finally, we generated unequivocal peptide evidence for a set of protein isoforms and showed that long read, direct RNA-seq allows the discovery of novel protein isoforms not already in reference databases or custom databases built from short read RNA-seq data. Our analysis highlights the benefits of long read RNA-seq data in the generation of reference databases to increase tandem mass spectrometry (MS/MS) identification of protein isoforms.
可变剪接可导致不同的蛋白质异构体。这些异构体在特定的细胞和组织或不同的发育阶段可能具有不同的功能。在这项研究中,我们探讨了是否可以从长读长、纳米孔直接 RNA 测序(RNA-seq)的转录本中提高对人 K562 细胞中蛋白质异构体的识别。通过与基于 Illumina 的短读 RNA-seq 比较,我们表明,大量 Ensembl 转录本(5949/14326)和表达可变剪接转录本的基因(486/2981)是通过短配对末端读取丢失的。通过共同分析蛋白质组学和转录组学数据,我们还表明,一些肽(826/35976)、蛋白质(262/3215)和来自不同转录变体的蛋白质异构体(574/1212)通过定制的长读长数据库中的异构体特异性肽鉴定,在基于 Illumina 的数据库中是丢失的。最后,我们为一组蛋白质异构体生成了明确的肽证据,并表明长读长、直接 RNA-seq 允许发现新的蛋白质异构体,这些异构体在参考数据库或从短读 RNA-seq 数据构建的定制数据库中尚未发现。我们的分析强调了长读 RNA-seq 数据在生成参考数据库中的优势,以增加串联质谱(MS/MS)对蛋白质异构体的鉴定。