Kennedy Ryan, Lladser Manuel E, Yarus Michael, Knight Rob
Department of Computer Science, University of Colorado at Boulder, 430 UCB, Boulder, CO 80309-0430, USA.
Front Biosci. 2008 May 1;13:6060-71. doi: 10.2741/3137.
The abundance of simple but functional RNA sites in random-sequence pools is critical for understanding emergence of RNA functions in nature and in the laboratory today. The complexity of a site is typically measured in terms of information, i.e. the Shannon entropy of the positions in a multiple sequence alignment. However, this calculation can be incorrect by many orders of magnitude. Here we compare several methods for estimating the abundance of RNA active-site patterns in the context of in vitro selection (SELEX), highlighting the strengths and weaknesses of each. We include in these methods a new approach that yields confidence bounds for the exact probability of finding specific kinds of RNA active sites. We show that all of the methods that take modularity into account provide far more accurate estimates of this probability than the informational methods, and that fast approximate methods are suitable for a wide range of RNA motifs.
随机序列库中简单但功能化的RNA位点的丰度,对于理解当今自然界和实验室中RNA功能的出现至关重要。位点的复杂性通常根据信息来衡量,即多序列比对中各位置的香农熵。然而,这种计算可能会出现多个数量级的误差。在这里,我们比较了几种在体外筛选(SELEX)背景下估计RNA活性位点模式丰度的方法,突出了每种方法的优缺点。这些方法中包括一种新方法,该方法能为找到特定类型RNA活性位点的精确概率给出置信区间。我们表明,所有考虑模块性的方法,比基于信息的方法能更准确地估计这种概率,并且快速近似方法适用于广泛的RNA基序。