Tremblay-Savard Olivier, Reinharz Vladimir, Waldispühl Jérôme
School of Computer Science, McGill University, Montreal, H3A 0E9, Canada.
Department of Computer Science, University of Manitoba, Winnipeg, R3T 2N2, Canada.
BMC Genomics. 2016 Nov 11;17(Suppl 10):862. doi: 10.1186/s12864-016-3105-4.
Secondary structures form the scaffold of multiple sequence alignment of non-coding RNA (ncRNA) families. An accurate reconstruction of ancestral ncRNAs must use this structural signal. However, the inference of ancestors of a single ncRNA family with a single consensus structure may bias the results towards sequences with high affinity to this structure, which are far from the true ancestors.
In this paper, we introduce achARNement, a maximum parsimony approach that, given two alignments of homologous ncRNA families with consensus secondary structures and a phylogenetic tree, simultaneously calculates ancestral RNA sequences for these two families.
We test our methodology on simulated data sets, and show that achARNement outperforms classical maximum parsimony approaches in terms of accuracy, but also reduces by several orders of magnitude the number of candidate sequences. To conclude this study, we apply our algorithms on the Glm clan and the FinP-traJ clan from the Rfam database.
Our results show that our methods reconstruct small sets of high-quality candidate ancestors with better agreement to the two target structures than with classical approaches. Our program is freely available at: http://csb.cs.mcgill.ca/acharnement .
二级结构构成了非编码RNA(ncRNA)家族多序列比对的框架。准确重建祖先ncRNA必须利用这种结构信号。然而,对具有单一共有结构的单个ncRNA家族的祖先进行推断可能会使结果偏向于与该结构具有高亲和力的序列,而这些序列并非真正的祖先。
在本文中,我们介绍了achARNement,这是一种最大简约法,给定具有共有二级结构的同源ncRNA家族的两个比对以及一棵系统发育树,可同时计算这两个家族的祖先RNA序列。
我们在模拟数据集上测试了我们的方法,结果表明achARNement在准确性方面优于经典的最大简约法,而且还将候选序列的数量减少了几个数量级。为总结本研究,我们将算法应用于Rfam数据库中的Glm家族和FinP-traJ家族。
我们的结果表明,与经典方法相比,我们的方法重建的高质量候选祖先的小集合与两个目标结构的一致性更好。我们的程序可在以下网址免费获取:http://csb.cs.mcgill.ca/acharnement 。