Center for Genomics and Systems Biology, New York University, New York, NY, USA.
Scientific & Digital Innovation, Sanofi Pasteur, Cambridge, MA, USA.
Methods Mol Biol. 2021;2252:313-329. doi: 10.1007/978-1-0716-1150-0_15.
The identification of upstream open reading frames (uORFs) using ribosome profiling data is complicated by several factors such as the noise inherent to the procedure, the substantial increase in potential translation initiation sites (and false positives) when one includes non-canonical start codons, and the paucity of molecularly validated uORFs. Here we present uORF-seqr, a novel machine learning algorithm that uses ribosome profiling data, in conjunction with RNA-seq data, as well as transcript aware genome annotation files to identify statistically significant AUG and near-cognate codon uORFs.
使用核糖体图谱数据鉴定上游开放阅读框(uORFs)受到多种因素的影响,例如该过程固有的噪声、包含非规范起始密码子时潜在翻译起始位点(和假阳性)大量增加,以及分子验证的 uORFs 稀缺。在这里,我们介绍了 uORF-seqr,这是一种新的机器学习算法,它使用核糖体图谱数据,结合 RNA-seq 数据以及转录物感知基因组注释文件,来鉴定具有统计学意义的 AUG 和近同形同义密码子 uORFs。