Wu Chun, Lu Xiaolong, Lu Shaohua, Wang Hongwei, Li Dehua, Zhao Jing, Jin Jingjie, Sun Zhenghua, He Qing-Yu, Chen Yang, Zhang Gong
Key Laboratory of Functional Protein Research of Guangdong Higher Education Institutes and MOE Key Laboratory of Tumor Molecular Biology, Institute of Life and Health Engineering, Jinan University, Guangzhou, China.
State Key Laboratory of Respiratory Disease, School of Basic Medical Sciences, Sino-French Hoffmann Institute, Guangzhou Medical University, Guangzhou, China.
Front Mol Biosci. 2022 Jun 2;9:895746. doi: 10.3389/fmolb.2022.895746. eCollection 2022.
Alternative splicing (AS) isoforms create numerous proteoforms, expanding the complexity of the genome. Highly similar sequences, incomplete reference databases and the insufficient sequence coverage of mass spectrometry limit the identification of AS proteoforms. Here, we demonstrated full-length translating mRNAs (ribosome nascent-chain complex-bound mRNAs, RNC-mRNAs) sequencing (RNC-seq) strategy to sequence the entire translating mRNA using next-generation sequencing, including short-read and long-read technologies, to construct a protein database containing all translating AS isoforms. Taking the advantage of read length, short-read RNC-seq identified up to 15,289 genes and 15,906 AS isoforms in a single human cell line, much more than the Ribo-seq. The single-molecule long-read RNC-seq supplemented 4,429 annotated AS isoforms that were not identified by short-read datasets, and 4,525 novel AS isoforms that were not included in the public databases. Using such RNC-seq-guided database, we identified 6,766 annotated protein isoforms and 50 novel protein isoforms in mass spectrometry datasets. These results demonstrated the potential of full-length RNC-seq in investigating the proteome of AS isoforms.
可变剪接(AS)异构体产生众多蛋白质变体,增加了基因组的复杂性。高度相似的序列、不完整的参考数据库以及质谱分析中不足的序列覆盖范围限制了AS蛋白质变体的鉴定。在此,我们展示了全长翻译mRNA(核糖体新生链复合物结合的mRNA,RNC-mRNA)测序(RNC-seq)策略,利用下一代测序技术(包括短读长和长读长技术)对整个翻译中的mRNA进行测序,以构建包含所有翻译中的AS异构体的蛋白质数据库。利用读长优势,短读长RNC-seq在单个人类细胞系中鉴定出多达15289个基因和15906个AS异构体,远超核糖体图谱测序(Ribo-seq)。单分子长读长RNC-seq补充了4429个短读长数据集未鉴定出的注释AS异构体,以及4525个公共数据库中未包含的新型AS异构体。使用这种RNC-seq指导的数据库,我们在质谱数据集中鉴定出6766个注释蛋白质异构体和50个新型蛋白质异构体。这些结果证明了全长RNC-seq在研究AS异构体蛋白质组方面的潜力。