Suppr超能文献

一种用于检测非编码RNA序列的局部多重比对方法。

A local multiple alignment method for detection of non-coding RNA sequences.

作者信息

Tabei Yasuo, Asai Kiyoshi

机构信息

Department of Computational biology, Graduate School of Frontier Sciences, University of Tokyo, Kashiwa, Chiba, Japan.

出版信息

Bioinformatics. 2009 Jun 15;25(12):1498-505. doi: 10.1093/bioinformatics/btp261. Epub 2009 Apr 17.

Abstract

MOTIVATION

Non-coding RNAs (ncRNAs) show a unique evolutionary process in which the substitutions of distant bases are correlated in order to conserve the secondary structure of the ncRNA molecule. Therefore, the multiple alignment method for the detection of ncRNAs should take into account both the primary sequence and the secondary structure. Recently, there has been intense focus on multiple alignment investigations for the detection of ncRNAs; however, most of the proposed methods are designed for global multiple alignments. For this reason, these methods are not appropriate to identify locally conserved ncRNAs among genomic sequences. A more efficient local multiple alignment method for the detection of ncRNAs is required.

RESULTS

We propose a new local multiple alignment method for the detection of ncRNAs. This method uses a local multiple alignment construction procedure inspired by ProDA, which is a local multiple aligner program for protein sequences with repeated and shuffled elements. To align sequences based on secondary structure information, we propose a new alignment model which incorporates secondary structure features. We define the conditional probability of an alignment via a conditional random field and use a gamma-centroid estimator to align sequences. The locally aligned subsequences are clustered into blocks of approximately globally alignable subsequences between pairwise alignments. Finally, these blocks are multiply aligned via MXSCARNA. In benchmark experiments, we demonstrate the high ability of the implemented software, SCARNA_LM, for local multiple alignment for the detection of ncRNAs.

AVAILABILITY

The C++ source code for SCARNA_LM and its experimental datasets are available at http://www.ncrna.org/software/scarna_lm/download.

SUPPLEMENTARY INFORMATION

Supplementary data are available at Bioinformatics online.

摘要

动机

非编码RNA(ncRNA)呈现出独特的进化过程,其中远距离碱基的替换相互关联,以保守ncRNA分子的二级结构。因此,用于检测ncRNA的多序列比对方法应同时考虑一级序列和二级结构。最近,人们对用于检测ncRNA的多序列比对研究高度关注;然而,大多数提出的方法是为全局多序列比对设计的。因此,这些方法不适用于在基因组序列中识别局部保守的ncRNA。需要一种更有效的用于检测ncRNA的局部多序列比对方法。

结果

我们提出了一种用于检测ncRNA的新的局部多序列比对方法。该方法使用了受ProDA启发的局部多序列比对构建程序,ProDA是一个用于具有重复和重排元件的蛋白质序列的局部多序列比对程序。为了基于二级结构信息比对序列,我们提出了一种纳入二级结构特征的新比对模型。我们通过条件随机场定义比对的条件概率,并使用伽马质心估计器来比对序列。局部比对的子序列被聚类成成对比对之间大致可全局比对的子序列块。最后,这些块通过MXSCARNA进行多序列比对。在基准实验中,我们展示了所实现的软件SCARNA_LM在用于检测ncRNA的局部多序列比对方面的高能力。

可用性

SCARNA_LM的C++源代码及其实验数据集可在http://www.ncrna.org/software/scarna_lm/download获取。

补充信息

补充数据可在《生物信息学》在线获取。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验