Suppr超能文献

使用语法字符串推导ncRNA共有二级结构

ncRNA consensus secondary structure derivation using grammar strings.

作者信息

Achawanantakun Rujira, Sun Yanni, Takyar Seyedeh Shohreh

机构信息

Computer Science and Engineering Department, Michigan State University, East Lansing, Michigan 48824, USA.

出版信息

J Bioinform Comput Biol. 2011 Apr;9(2):317-37. doi: 10.1142/s0219720011005501.

Abstract

Many noncoding RNAs (ncRNAs) function through both their sequences and secondary structures. Thus, secondary structure derivation is an important issue in today's RNA research. The state-of-the-art structure annotation tools are based on comparative analysis, which derives consensus structure of homologous ncRNAs. Despite promising results from existing ncRNA aligning and consensus structure derivation tools, there is a need for more efficient and accurate ncRNA secondary structure modeling and alignment methods. In this work, we introduce a consensus structure derivation approach based on grammar string, a novel ncRNA secondary structure representation that encodes an ncRNA's sequence and secondary structure in the parameter space of a context-free grammar (CFG) and a full RNA grammar including pseudoknots. Being a string defined on a special alphabet constructed from a grammar, grammar string converts ncRNA alignment into sequence alignment. We derive consensus secondary structures from hundreds of ncRNA families from BraliBase 2.1 and 25 families containing pseudoknots using grammar string alignment. Our experiments have shown that grammar string-based structure derivation competes favorably in consensus structure quality with Murlet and RNASampler. Source code and experimental data are available at http://www.cse.msu.edu/~yannisun/grammar-string.

摘要

许多非编码RNA(ncRNA)通过其序列和二级结构发挥作用。因此,二级结构推导是当今RNA研究中的一个重要问题。目前最先进的结构注释工具基于比较分析,即推导同源ncRNA的共有结构。尽管现有的ncRNA比对和共有结构推导工具取得了令人鼓舞的结果,但仍需要更高效、准确的ncRNA二级结构建模和比对方法。在这项工作中,我们介绍了一种基于语法字符串的共有结构推导方法,语法字符串是一种新颖的ncRNA二级结构表示形式,它在上下文无关语法(CFG)和包括假结的完整RNA语法的参数空间中对ncRNA的序列和二级结构进行编码。作为在由语法构建的特殊字母表上定义的字符串,语法字符串将ncRNA比对转换为序列比对。我们使用语法字符串比对从BraliBase 2.1中的数百个ncRNA家族和25个包含假结的家族中推导共有二级结构。我们的实验表明,基于语法字符串的结构推导在共有结构质量方面与Murlet和RNASampler相比具有优势。源代码和实验数据可在http://www.cse.msu.edu/~yannisun/grammar-string获取。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验