Institute for Interdisciplinary Information Sciences, Tsinghua University, Beijing, China.
School of Medicine, Tsinghua University, Beijing, China.
Bioinformatics. 2017 Jul 15;33(14):i234-i242. doi: 10.1093/bioinformatics/btx247.
Translation initiation is a key step in the regulation of gene expression. In addition to the annotated translation initiation sites (TISs), the translation process may also start at multiple alternative TISs (including both AUG and non-AUG codons), which makes it challenging to predict TISs and study the underlying regulatory mechanisms. Meanwhile, the advent of several high-throughput sequencing techniques for profiling initiating ribosomes at single-nucleotide resolution, e.g. GTI-seq and QTI-seq, provides abundant data for systematically studying the general principles of translation initiation and the development of computational method for TIS identification.
We have developed a deep learning-based framework, named TITER, for accurately predicting TISs on a genome-wide scale based on QTI-seq data. TITER extracts the sequence features of translation initiation from the surrounding sequence contexts of TISs using a hybrid neural network and further integrates the prior preference of TIS codon composition into a unified prediction framework.
Extensive tests demonstrated that TITER can greatly outperform the state-of-the-art prediction methods in identifying TISs. In addition, TITER was able to identify important sequence signatures for individual types of TIS codons, including a Kozak-sequence-like motif for AUG start codon. Furthermore, the TITER prediction score can be related to the strength of translation initiation in various biological scenarios, including the repressive effect of the upstream open reading frames on gene expression and the mutational effects influencing translation initiation efficiency.
TITER is available as an open-source software and can be downloaded from https://github.com/zhangsaithu/titer .
lzhang20@mail.tsinghua.edu.cn or zengjy321@tsinghua.edu.cn.
Supplementary data are available at Bioinformatics online.
翻译起始是基因表达调控的关键步骤。除了已注释的翻译起始位点(TIS)外,翻译过程还可能从多个替代 TIS(包括 AUG 和非 AUG 密码子)开始,这使得 TIS 的预测和研究潜在的调控机制具有挑战性。同时,几种高通量测序技术的出现,如 GTI-seq 和 QTI-seq,可在单核苷酸分辨率下对起始核糖体进行分析,为系统研究翻译起始的一般原理和开发 TIS 识别的计算方法提供了丰富的数据。
我们开发了一种基于深度学习的框架 TITER,用于根据 QTI-seq 数据在全基因组范围内准确预测 TIS。TITER 使用混合神经网络从 TIS 周围序列环境中提取翻译起始的序列特征,并将 TIS 密码子组成的先验偏好进一步整合到统一的预测框架中。
广泛的测试表明,TITER 在识别 TIS 方面可以大大优于最先进的预测方法。此外,TITER 还能够识别各种 TIS 密码子类型的重要序列特征,包括 AUG 起始密码子的 Kozak 序列样基序。此外,TITER 的预测得分可以与各种生物学情景下的翻译起始强度相关联,包括上游开放阅读框对基因表达的抑制作用以及影响翻译起始效率的突变效应。
TITER 作为开源软件提供,并可从 https://github.com/zhangsaithu/titer 下载。
lzhang20@mail.tsinghua.edu.cn 或 zengjy321@tsinghua.edu.cn。
补充数据可在 Bioinformatics 在线获取。