KERMIT, Department of Data Analysis and Mathematical Modelling, Ghent University, Coupure Links 653, 9000 Gent, Belgium.
Biobix, Department of Data Analysis and Mathematical Modelling, Ghent University, Coupure Links 653, 9000 Gent, Belgium.
Nucleic Acids Res. 2019 Apr 8;47(6):e36. doi: 10.1093/nar/gkz061.
Annotation of gene expression in prokaryotes often finds itself corrected due to small variations of the annotated gene regions observed between different (sub)-species. It has become apparent that traditional sequence alignment algorithms, used for the curation of genomes, are not able to map the full complexity of the genomic landscape. We present DeepRibo, a novel neural network utilizing features extracted from ribosome profiling information and binding site sequence patterns that shows to be a precise tool for the delineation and annotation of expressed genes in prokaryotes. The neural network combines recurrent memory cells and convolutional layers, adapting the information gained from both the high-throughput ribosome profiling data and ribosome binding translation initiation sequence region into one model. DeepRibo is designed as a single model trained on a variety of ribosome profiling experiments, used for the identification of open reading frames in prokaryotes without a priori knowledge of the translational landscape. Through extensive validation of the model trained on various sets of data, multiple species sequence similarity, mass spectrometry and Edman degradation verified proteins, the effectiveness of DeepRibo is highlighted.
原核生物基因表达注释经常因不同(亚)物种之间观察到的注释基因区域的微小变化而需要修正。显然,传统的用于基因组策管的序列比对算法无法映射基因组景观的全部复杂性。我们提出了 DeepRibo,这是一种利用核糖体图谱信息和结合位点序列模式提取的特征的新型神经网络,它是一种精确的原核生物表达基因划定和注释的工具。该神经网络结合了递归记忆单元和卷积层,将从高通量核糖体图谱数据和核糖体结合翻译起始序列区域获得的信息整合到一个模型中。DeepRibo 被设计为一个在各种核糖体图谱实验上训练的单一模型,用于在没有翻译景观先验知识的情况下识别原核生物中的开放阅读框。通过对在不同数据集上训练的模型进行广泛验证,包括多种物种序列相似性、质谱和 Edman 降解验证的蛋白质,突出了 DeepRibo 的有效性。