Chao Kuan-Hao, Mao Alan, Salzberg Steven L, Pertea Mihaela
Department of Computer Science, Johns Hopkins University, Baltimore, MD 21218, USA.
Center for Computational Biology, Johns Hopkins University, Baltimore, MD 21218, USA.
bioRxiv. 2023 Jul 29:2023.07.27.550754. doi: 10.1101/2023.07.27.550754.
The process of splicing messenger RNA to remove introns plays a central role in creating genes and gene variants. Here we describe Splam, a novel method for predicting splice junctions in DNA based on deep residual convolutional neural networks. Unlike some previous models, Splam looks at a relatively limited window of 400 base pairs flanking each splice site, motivated by the observation that the biological process of splicing relies primarily on signals within this window. Additionally, Splam introduces the idea of training the network on donor and acceptor pairs together, based on the principle that the splicing machinery recognizes both ends of each intron at once. We compare Splam's accuracy to recent state-of-the-art splice site prediction methods, particularly SpliceAI, another method that uses deep neural networks. Our results show that Splam is consistently more accurate than SpliceAI, with an overall accuracy of 96% at predicting human splice junctions. Splam generalizes even to non-human species, including distant ones like the flowering plant . Finally, we demonstrate the use of Splam on a novel application: processing the spliced alignments of RNA-seq data to identify and eliminate errors. We show that when used in this manner, Splam yields substantial improvements in the accuracy of downstream transcriptome analysis of both poly(A) and ribo-depleted RNA-seq libraries. Overall, Splam offers a faster and more accurate approach to detecting splice junctions, while also providing a reliable and efficient solution for cleaning up erroneous spliced alignments.
将信使核糖核酸剪接以去除内含子的过程在基因和基因变体的形成中起着核心作用。在此,我们描述了Splam,一种基于深度残差卷积神经网络预测DNA中剪接位点的新方法。与一些先前的模型不同,Splam着眼于每个剪接位点两侧400个碱基对的相对有限窗口,这是基于剪接的生物学过程主要依赖于该窗口内信号的观察结果。此外,Splam引入了基于剪接机制同时识别每个内含子两端的原理,对供体和受体对一起进行网络训练的理念。我们将Splam的准确性与最近的最先进剪接位点预测方法,特别是另一种使用深度神经网络的方法SpliceAI进行比较。我们的结果表明,Splam始终比SpliceAI更准确,在预测人类剪接位点时总体准确率为96%。Splam甚至可以推广到非人类物种,包括像开花植物这样的远缘物种。最后,我们展示了Splam在一个新应用中的使用:处理RNA测序数据的剪接比对以识别和消除错误。我们表明,以这种方式使用时,Splam在对聚腺苷酸和核糖体去除的RNA测序文库的下游转录组分析准确性方面有显著提高。总体而言,Splam提供了一种更快、更准确的检测剪接位点的方法,同时也为清理错误的剪接比对提供了可靠且高效的解决方案。