Ndah Elvis, Jonckheere Veronique, Giess Adam, Valen Eivind, Menschaert Gerben, Van Damme Petra
VIB-UGent Center for Medical Biotechnology, B-9000 Ghent, Belgium.
Department of Biochemistry, Ghent University, B-9000 Ghent, Belgium.
Nucleic Acids Res. 2017 Nov 16;45(20):e168. doi: 10.1093/nar/gkx758.
Prokaryotic genome annotation is highly dependent on automated methods, as manual curation cannot keep up with the exponential growth of sequenced genomes. Current automated methods depend heavily on sequence composition and often underestimate the complexity of the proteome. We developed RibosomeE Profiling Assisted (re-)AnnotaTION (REPARATION), a de novo machine learning algorithm that takes advantage of experimental protein synthesis evidence from ribosome profiling (Ribo-seq) to delineate translated open reading frames (ORFs) in bacteria, independent of genome annotation (https://github.com/Biobix/REPARATION). REPARATION evaluates all possible ORFs in the genome and estimates minimum thresholds based on a growth curve model to screen for spurious ORFs. We applied REPARATION to three annotated bacterial species to obtain a more comprehensive mapping of their translation landscape in support of experimental data. In all cases, we identified hundreds of novel (small) ORFs including variants of previously annotated ORFs and >70% of all (variants of) annotated protein coding ORFs were predicted by REPARATION to be translated. Our predictions are supported by matching mass spectrometry proteomics data, sequence composition and conservation analysis. REPARATION is unique in that it makes use of experimental translation evidence to intrinsically perform a de novo ORF delineation in bacterial genomes irrespective of the sequence features linked to open reading frames.
原核生物基因组注释高度依赖自动化方法,因为人工注释无法跟上测序基因组呈指数级增长的速度。当前的自动化方法严重依赖序列组成,并且常常低估蛋白质组的复杂性。我们开发了核糖体剖析辅助(重新)注释(REPARATION),这是一种从头开始的机器学习算法,它利用核糖体剖析(Ribo-seq)的实验性蛋白质合成证据来描绘细菌中已翻译的开放阅读框(ORF),而不依赖于基因组注释(https://github.com/Biobix/REPARATION)。REPARATION评估基因组中所有可能的ORF,并基于生长曲线模型估计最小阈值以筛选假阳性ORF。我们将REPARATION应用于三种已注释的细菌物种,以获得它们翻译图谱的更全面映射,以支持实验数据。在所有情况下,我们都鉴定出了数百个新的(小)ORF,包括先前注释的ORF的变体,并且REPARATION预测所有注释的蛋白质编码ORF(及其变体)中有超过70%会被翻译。我们的预测得到了匹配的质谱蛋白质组学数据、序列组成和保守性分析的支持。REPARATION的独特之处在于,它利用实验性翻译证据在细菌基因组中内在地进行从头ORF描绘,而不考虑与开放阅读框相关的序列特征。