EMBL-European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridgeshire CB101SD, United Kingdom.
RNA. 2011 Apr;17(4):578-94. doi: 10.1261/rna.2536111. Epub 2011 Feb 28.
With the availability of genome-wide transcription data and massive comparative sequencing, the discrimination of coding from noncoding RNAs and the assessment of coding potential in evolutionarily conserved regions arose as a core analysis task. Here we present RNAcode, a program to detect coding regions in multiple sequence alignments that is optimized for emerging applications not covered by current protein gene-finding software. Our algorithm combines information from nucleotide substitution and gap patterns in a unified framework and also deals with real-life issues such as alignment and sequencing errors. It uses an explicit statistical model with no machine learning component and can therefore be applied "out of the box," without any training, to data from all domains of life. We describe the RNAcode method and apply it in combination with mass spectrometry experiments to predict and confirm seven novel short peptides in Escherichia coli and to analyze the coding potential of RNAs previously annotated as "noncoding." RNAcode is open source software and available for all major platforms at http://wash.github.com/rnacode.
随着全基因组转录数据和大规模比较测序的出现,区分编码 RNA 和非编码 RNA,以及评估进化保守区域的编码潜力成为了核心分析任务。在这里,我们介绍了 RNAcode,这是一种用于在多重序列比对中检测编码区域的程序,它针对当前蛋白质基因发现软件未涵盖的新兴应用进行了优化。我们的算法将核苷酸替换和空位模式的信息结合在一个统一的框架中,还处理了对齐和测序错误等实际问题。它使用一个没有机器学习组件的显式统计模型,因此可以“开箱即用”,无需任何训练,即可应用于来自生命所有领域的数据。我们描述了 RNAcode 方法,并将其与质谱实验结合使用,以预测和确认大肠杆菌中的七个新的短肽,并分析以前注释为“非编码”的 RNA 的编码潜力。RNAcode 是开源软件,可在所有主要平台上使用,网址为 http://wash.github.com/rnacode。