Suppr超能文献

CMfinder——一种基于协方差模型的RNA基序查找算法。

CMfinder--a covariance model based RNA motif finding algorithm.

作者信息

Yao Zizhen, Weinberg Zasha, Ruzzo Walter L

机构信息

Department of Computer Science and Engineering, University of Washington, Seattle, WA 98195-2350, USA.

出版信息

Bioinformatics. 2006 Feb 15;22(4):445-52. doi: 10.1093/bioinformatics/btk008. Epub 2005 Dec 15.

Abstract

MOTIVATION

The recent discoveries of large numbers of non-coding RNAs and computational advances in genome-scale RNA search create a need for tools for automatic, high quality identification and characterization of conserved RNA motifs that can be readily used for database search. Previous tools fall short of this goal.

RESULTS

CMfinder is a new tool to predict RNA motifs in unaligned sequences. It is an expectation maximization algorithm using covariance models for motif description, featuring novel integration of multiple techniques for effective search of motif space, and a Bayesian framework that blends mutual information-based and folding energy-based approaches to predict structure in a principled way. Extensive tests show that our method works well on datasets with either low or high sequence similarity, is robust to inclusion of lengthy extraneous flanking sequence and/or completely unrelated sequences, and is reasonably fast and scalable. In testing on 19 known ncRNA families, including some difficult cases with poor sequence conservation and large indels, our method demonstrates excellent average per-base-pair accuracy--79% compared with at most 60% for alternative methods. More importantly, the resulting probabilistic model can be directly used for homology search, allowing iterative refinement of structural models based on additional homologs. We have used this approach to obtain highly accurate covariance models of known RNA motifs based on small numbers of related sequences, which identified homologs in deeply-diverged species.

摘要

动机

近期大量非编码RNA的发现以及基因组规模RNA搜索中的计算进展,催生了对自动、高质量识别和表征保守RNA基序工具的需求,这些工具可方便地用于数据库搜索。以往的工具未能实现这一目标。

结果

CMfinder是一种用于预测未比对序列中RNA基序的新工具。它是一种期望最大化算法,使用协方差模型来描述基序,其特点是新颖地整合了多种技术以有效搜索基序空间,以及一个贝叶斯框架,该框架融合了基于互信息和基于折叠能量的方法,以有原则的方式预测结构。广泛的测试表明,我们的方法在序列相似性低或高的数据集上都能很好地工作,对于包含冗长的无关侧翼序列和/或完全不相关的序列具有鲁棒性,并且速度合理且可扩展。在对19个已知ncRNA家族进行测试时,包括一些序列保守性差和存在大插入缺失的困难情况,我们的方法展示了出色的平均每碱基对准确率——为79%,而替代方法最高为60%。更重要的是,所得的概率模型可直接用于同源性搜索,允许基于额外的同源物对结构模型进行迭代优化。我们已使用这种方法基于少量相关序列获得了已知RNA基序的高精度协方差模型,这些模型在深度分化物种中识别出了同源物。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验