Peking-Tsinghua Center for Life Sciences, Academy for Advanced Interdisciplinary Studies, Peking University, 100871 Beijing, China.
Center for Precision Medicine Multi-Omics Research, Peking University Health Science Center, 100191 Beijing, China.
Anal Chem. 2023 Jul 18;95(28):10610-10617. doi: 10.1021/acs.analchem.3c00870. Epub 2023 Jul 9.
Alternative splicing allows a small number of human genes to encode large amounts of proteoforms that play essential roles in normal and disease physiology. Some low-abundance proteoforms may remain undiscovered due to limited detection and analysis capabilities. Peptides coencoded by novel exons and annotated exons separated by introns are called novel junction peptides, which are the key to identifying novel proteoforms. Traditional sequencing does not take into account the specificity in the composition of the novel junction peptide and is therefore not as accurate. We first developed a novel sequencing algorithm, CNovo, which outperformed the mainstream PEAKS and Novor in all six test sets. We then built on CNovo to develop a semi- sequencing algorithm, SpliceNovo, specifically for identifying novel junction peptides. SpliceNovo identifies junction peptides with much higher accuracy than CNovo, CJunction, PEAKS, and Novor. Of course, it is also possible to replace the built-in CNovo in SpliceNovo with other more accurate sequencing algorithms to further improve its performance. We also successfully identified and validated two novel proteoforms of the human EIF4G1 and ELAVL1 genes by SpliceNovo. Our results significantly improve the ability to discover novel proteoforms through sequencing.
可变剪接使少量人类基因能够编码大量的蛋白异构体,这些蛋白异构体在正常和疾病生理中发挥着重要作用。由于检测和分析能力有限,一些低丰度的蛋白异构体可能仍然未被发现。由新外显子和注释外显子通过内含子编码的肽称为新连接肽,这是识别新蛋白异构体的关键。传统的测序没有考虑到新连接肽组成的特异性,因此不够准确。我们首先开发了一种新的测序算法 CNovo,它在所有六个测试集中都优于主流的 PEAKS 和 Novor。然后,我们基于 CNovo 开发了一种半测序算法 SpliceNovo,专门用于识别新的连接肽。SpliceNovo 识别连接肽的准确性远高于 CNovo、CJunction、PEAKS 和 Novor。当然,也可以用其他更准确的测序算法来替换 SpliceNovo 中的内置 CNovo,以进一步提高其性能。我们还通过 SpliceNovo 成功鉴定和验证了人类 EIF4G1 和 ELAVL1 基因的两种新的蛋白异构体。我们的结果显著提高了通过测序发现新蛋白异构体的能力。