Bradley Robert K, Pachter Lior, Holmes Ian
Biophysics Graduate Group, Department of Mathematics and Department of Bioengineering, University of California, Berkeley, CA 94720, USA.
Bioinformatics. 2008 Dec 1;24(23):2677-83. doi: 10.1093/bioinformatics/btn495. Epub 2008 Sep 16.
Whole-genome screens suggest that eukaryotic genomes are dense with non-coding RNAs (ncRNAs). We introduce a novel approach to RNA multiple alignment which couples a generative probabilistic model of sequence and structure with an efficient sequence annealing approach for exploring the space of multiple alignments. This leads to a new software program, Stemloc-AMA, that is both accurate and specific in the alignment of multiple related RNA sequences.
When tested on the benchmark datasets BRalibase II and BRalibase 2.1, Stemloc-AMA has comparable sensitivity to and better specificity than the best competing methods. We use a large-scale random sequence experiment to show that while most alignment programs maximize sensitivity at the expense of specificity, even to the point of giving complete alignments of non-homologous sequences, Stemloc-AMA aligns only sequences with detectable homology and leaves unrelated sequences largely unaligned. Such accurate and specific alignments are crucial for comparative-genomics analysis, from inferring phylogeny to estimating substitution rates across different lineages.
Stemloc-AMA is available from http://biowiki.org/StemLocAMA as part of the dart software package for sequence analysis.
全基因组筛选表明真核生物基因组中充满了非编码RNA(ncRNA)。我们引入了一种新的RNA多序列比对方法,该方法将序列和结构的生成概率模型与一种用于探索多序列比对空间的高效序列退火方法相结合。这产生了一个新的软件程序Stemloc-AMA,它在多个相关RNA序列的比对中既准确又具有特异性。
在基准数据集BRalibase II和BRalibase 2.1上进行测试时,Stemloc-AMA与最佳竞争方法相比具有相当的灵敏度和更好的特异性。我们通过大规模随机序列实验表明,虽然大多数比对程序以牺牲特异性为代价来最大化灵敏度,甚至达到对非同源序列进行完全比对的程度,但Stemloc-AMA只比对具有可检测同源性的序列,而让不相关的序列基本不进行比对。这种准确且具有特异性的比对对于比较基因组学分析至关重要,从推断系统发育到估计不同谱系间的替换率。
Stemloc-AMA可从http://biowiki.org/StemLocAMA获取,作为用于序列分析的dart软件包的一部分。