Bioinformatics and Genomics Program, Centre for Genomic Regulation, 08003 Barcelona, Spain.
Bioinformatics. 2013 May 1;29(9):1112-9. doi: 10.1093/bioinformatics/btt096. Epub 2013 Feb 28.
Aligning RNAs is useful to search for homologous genes, study evolutionary relationships, detect conserved regions and identify any patterns that may be of biological relevance. Poor levels of conservation among homologs, however, make it difficult to compare RNA sequences, even when considering closely evolutionary related sequences.
We describe SARA-Coffee, a tertiary structure-based multiple RNA aligner, which has been validated using BRAliDARTS, a new benchmark framework designed for evaluating tertiary structure-based multiple RNA aligners. We provide two methods to measure the capacity of alignments to match corresponding secondary and tertiary structure features. On this benchmark, SARA-Coffee outperforms both regular aligners and those using secondary structure information. Furthermore, we show that on sequences in which <60% of the nucleotides form base pairs, primary sequence methods usually perform better than secondary-structure aware aligners.
The package and the datasets are available from http://www.tcoffee.org/Projects/saracoffee and http://structure.biofold.org/sara/.
对齐 RNA 有助于搜索同源基因、研究进化关系、检测保守区域,并识别可能具有生物学相关性的任何模式。然而,由于同源物之间的保守程度较差,即使考虑到进化上密切相关的序列,也很难比较 RNA 序列。
我们描述了 SARA-Coffee,这是一种基于三级结构的多 RNA 对齐程序,该程序已经使用 BRAliDARTS 进行了验证,后者是一种新的基准框架,旨在评估基于三级结构的多 RNA 对齐程序。我们提供了两种方法来衡量对齐的能力,以匹配相应的二级和三级结构特征。在这个基准上,SARA-Coffee 优于常规对齐程序和使用二级结构信息的程序。此外,我们表明,在<60%的核苷酸形成碱基对的序列中,通常是基于一级序列的方法比基于二级结构的对齐程序表现更好。
软件包和数据集可从 http://www.tcoffee.org/Projects/saracoffee 和 http://structure.biofold.org/sara/ 获取。