用于RNA结构预测的系统发育增强型统计工具。

Phylogenetically enhanced statistical tools for RNA structure prediction.

作者信息

Akmaev V R, Kelley S T, Stormo G D

机构信息

Dept. of Applied Mathematics, Box 526, University of Colorado, Boulder, CO 80309, USA.

出版信息

Bioinformatics. 2000 Jun;16(6):501-12. doi: 10.1093/bioinformatics/16.6.501.

DOI:10.1093/bioinformatics/16.6.501

PMID:10980147

Abstract

MOTIVATION

Methods that predict the structure of molecules by looking for statistical correlation have been quite effective. Unfortunately, these methods often disregard phylogenetic information in the sequences they analyze. Here, we present a number of statistics for RNA molecular-structure prediction. Besides common pair-wise comparisons, we consider a few reasonable statistics for base-triple predictions, and present an elaborate analysis of these methods. All these statistics incorporate phylogenetic relationships of the sequences in the analysis to varying degrees, and the different nature of these tests gives a wide choice of statistical tools for RNA structure prediction.

RESULTS

Starting from statistics that incorporate phylogenetic information only as independent sequence evolution models for each position of a multiple alignment, and extending this idea to a joint evolution model of two positions, we enhance the usual purely statistical methods (e.g. methods based on the Mutual Information statistic) with the use of phylogenetic information available in the sequences. In particular, we present a joint model based on the HKY evolution model, and consequently a X(2) test of independence for two positions. A significant part of this work is devoted to some mathematical analysis of these methods. We tested these statistics on regions of 16S and 23S rRNA, and tRNA.

摘要

动机

通过寻找统计相关性来预测分子结构的方法已经相当有效。不幸的是，这些方法在分析序列时常常忽略系统发育信息。在此，我们提出了一些用于RNA分子结构预测的统计方法。除了常见的成对比较外，我们还考虑了一些用于碱基三联体预测的合理统计方法，并对这些方法进行了详尽分析。所有这些统计方法在不同程度上纳入了序列的系统发育关系，并且这些测试的不同性质为RNA结构预测提供了广泛的统计工具选择。

结果

从仅将系统发育信息作为多重比对中每个位置的独立序列进化模型纳入的统计方法开始，并将这一想法扩展到两个位置的联合进化模型，我们利用序列中可用的系统发育信息增强了通常的纯统计方法（例如基于互信息统计的方法）。特别是，我们提出了一个基于HKY进化模型的联合模型，并因此提出了两个位置的独立性卡方检验。这项工作的很大一部分致力于对这些方法进行一些数学分析。我们在16S和23S rRNA以及tRNA的区域上测试了这些统计方法。