Suppr超能文献

COSSMO:使用深度学习预测竞争的剪接位点选择。

COSSMO: predicting competitive alternative splice site selection using deep learning.

机构信息

Deep Genomics Inc, Toronto, Canada.

Department of Computer Science, University of Toronto, Toronto, Canada.

出版信息

Bioinformatics. 2018 Jul 1;34(13):i429-i437. doi: 10.1093/bioinformatics/bty244.

Abstract

MOTIVATION

Alternative splice site selection is inherently competitive and the probability of a given splice site to be used also depends on the strength of neighboring sites. Here, we present a new model named the competitive splice site model (COSSMO), which explicitly accounts for these competitive effects and predicts the percent selected index (PSI) distribution over any number of putative splice sites. We model an alternative splicing event as the choice of a 3' acceptor site conditional on a fixed upstream 5' donor site or the choice of a 5' donor site conditional on a fixed 3' acceptor site. We build four different architectures that use convolutional layers, communication layers, long short-term memory and residual networks, respectively, to learn relevant motifs from sequence alone. We also construct a new dataset from genome annotations and RNA-Seq read data that we use to train our model.

RESULTS

COSSMO is able to predict the most frequently used splice site with an accuracy of 70% on unseen test data, and achieve an R2 of 0.6 in modeling the PSI distribution. We visualize the motifs that COSSMO learns from sequence and show that COSSMO recognizes the consensus splice site sequences and many known splicing factors with high specificity.

AVAILABILITY AND IMPLEMENTATION

Model predictions, our training dataset, and code are available from http://cossmo.genes.toronto.edu.

SUPPLEMENTARY INFORMATION

Supplementary data are available at Bioinformatics online.

摘要

动机

选择性剪接位点的选择具有内在的竞争性,并且给定剪接位点的使用概率也取决于相邻位点的强度。在这里,我们提出了一种新的模型,称为竞争剪接位点模型(COSSMO),该模型明确考虑了这些竞争效应,并预测了任意数量的假定剪接位点的百分选用指数(PSI)分布。我们将选择性剪接事件建模为在固定的上游 5'供体位点条件下选择 3'接受位,或者在固定的 3'接受位条件下选择 5'供体位点。我们构建了四个不同的架构,分别使用卷积层、通信层、长短时记忆和残差网络,从序列中单独学习相关基序。我们还从基因组注释和 RNA-Seq 读数据构建了一个新的数据集,用于训练我们的模型。

结果

COSSMO 能够在未见的测试数据上以 70%的准确率预测最常用的剪接位点,并在 PSI 分布建模方面达到 0.6 的 R2。我们可视化了 COSSMO 从序列中学习到的基序,并表明 COSSMO 以高特异性识别了共识剪接位点序列和许多已知的剪接因子。

可用性和实现

模型预测、我们的训练数据集和代码可从 http://cossmo.genes.toronto.edu 获得。

补充信息

补充数据可在生物信息学在线获得。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b5f8/6022534/d8e890b3f465/bty244f1.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验