Department of Computer Science, Stanford University, Stanford, California 94305, USA.
Department of Biomedical Data Science, Stanford University School of Medicine, Stanford, California 94305, USA.
RNA. 2020 Jul;26(7):851-865. doi: 10.1261/rna.074161.119. Epub 2020 Mar 27.
Subcellular localization is essential to RNA biogenesis, processing, and function across the gene expression life cycle. However, the specific nucleotide sequence motifs that direct RNA localization are incompletely understood. Fortunately, new sequencing technologies have provided transcriptome-wide atlases of RNA localization, creating an opportunity to leverage computational modeling. Here we present RNA-GPS, a new machine learning model that uses nucleotide-level features to predict RNA localization across eight different subcellular locations-the first to provide such a wide range of predictions. RNA-GPS's design enables high-throughput sequence ablation and feature importance analyses to probe the sequence motifs that drive localization prediction. We find localization informative motifs to be concentrated on 3'-UTRs and scattered along the coding sequence, and motifs related to splicing to be important drivers of predicted localization, even for cytotopic distinctions for membraneless bodies within the nucleus or for organelles within the cytoplasm. Overall, our results suggest transcript splicing is one of many elements influencing RNA subcellular localization.
亚细胞定位对于整个基因表达生命周期中的 RNA 生物发生、加工和功能至关重要。然而,指导 RNA 定位的特定核苷酸序列基序还不完全清楚。幸运的是,新的测序技术提供了 RNA 定位的转录组范围图谱,为利用计算建模创造了机会。在这里,我们提出了 RNA-GPS,这是一种新的机器学习模型,它使用核苷酸级别的特征来预测 RNA 在八个不同亚细胞位置的定位——这是第一个提供如此广泛预测的模型。RNA-GPS 的设计能够进行高通量的序列消去和特征重要性分析,以探究驱动定位预测的序列基序。我们发现定位信息基序集中在 3'-UTR 上,并沿着编码序列分散,与剪接相关的基序是预测定位的重要驱动因素,即使对于核内无膜体或细胞质内细胞器的细胞定位差异也是如此。总的来说,我们的结果表明转录剪接是影响 RNA 亚细胞定位的众多因素之一。