Kolbe Diana L, Eddy Sean R
HHMI Janelia Farm Research Campus, Ashburn, VA 20147, USA.
Bioinformatics. 2009 May 15;25(10):1236-43. doi: 10.1093/bioinformatics/btp154. Epub 2009 Mar 20.
Accuracy of automated structural RNA alignment is improved by using models that consider not only primary sequence but also secondary structure information. However, current RNA structural alignment approaches tend to perform poorly on incomplete sequence fragments, such as single reads from metagenomic environmental surveys, because nucleotides that are expected to be base paired are missing.
We present a local RNA structural alignment algorithm, trCYK, for aligning and scoring incomplete sequences under a model using primary sequence conservation and secondary structure information when possible. The trCYK algorithm improves alignment accuracy and coverage of sequence fragments of structural RNAs in simulated metagenomic shotgun datasets.
The source code for Infernal 1.0, which includes trCYK, is available at http://infernal.janelia.org.
通过使用不仅考虑一级序列而且考虑二级结构信息的模型,可以提高自动化结构RNA比对的准确性。然而,当前的RNA结构比对方法在不完整的序列片段上往往表现不佳,例如宏基因组环境调查中的单条读数,因为预期会碱基配对的核苷酸缺失了。
我们提出了一种局部RNA结构比对算法trCYK,用于在可能的情况下,根据使用一级序列保守性和二级结构信息的模型,比对不完整序列并进行评分。trCYK算法提高了模拟宏基因组鸟枪法数据集中结构RNA序列片段的比对准确性和覆盖率。
包含trCYK的Infernal 1.0的源代码可在http://infernal.janelia.org获取。