Michal Shahar, Ivry Tor, Sipper Moshe, Barash Danny, Schalit-Cohen Omer
IEEE/ACM Trans Comput Biol Bioinform. 2007 Oct-Dec;4(4):596-610. doi: 10.1109/tcbb.2007.1045.
We focus on finding a consensus motif of a set of homologous or functionally related RNA molecules. Recent approaches to this problem have been limited to simple motifs, require sequence alignment, and make prior assumptions concerning the data set. We use genetic programming to predict RNA consensus motifs based solely on the data set. Our system -- dubbed GeRNAMo (Genetic programming of RNA Motifs) -- predicts the most common motifs without sequence alignment and is capable of dealing with any motif size. Our program only requires the maximum number of stems in the motif, and if prior knowledge is available the user can specify other attributes of the motif (e.g., the range of the motif's minimum and maximum sizes), thereby increasing both sensitivity and speed. We describe several experiments using either ferritin iron response element (IRE); signal recognition particle (SRP); or microRNA sequences, showing that the most common motif is found repeatedly, and that our system offers substantial advantages over previous methods.
我们专注于寻找一组同源或功能相关的RNA分子的共有基序。解决这个问题的现有方法仅限于简单基序,需要进行序列比对,并且对数据集做出先验假设。我们使用遗传编程仅基于数据集来预测RNA共有基序。我们的系统——称为GeRNAMo(RNA基序的遗传编程)——无需序列比对就能预测最常见的基序,并且能够处理任何基序大小。我们的程序仅需要基序中的最大茎数,如果有先验知识,用户可以指定基序的其他属性(例如,基序最小和最大大小的范围),从而提高灵敏度和速度。我们描述了几个使用铁蛋白铁反应元件(IRE)、信号识别颗粒(SRP)或微小RNA序列进行的实验,结果表明最常见的基序能被反复找到,并且我们的系统相对于以前的方法具有显著优势。