Fogel Gary B, Porto V William, Weekes Dana G, Fogel David B, Griffey Richard H, McNeil John A, Lesnik Elena, Ecker David J, Sampath Rangarajan
Natural Selection Inc., 3333 North Torrey Pines Court, Suite 200, La Jolla, CA 92037, USA.
Nucleic Acids Res. 2002 Dec 1;30(23):5310-7. doi: 10.1093/nar/gkf653.
RNA molecules fold into characteristic secondary and tertiary structures that account for their diverse functional activities. Many of these RNA structures, or certain structural motifs within them, are thought to recur in multiple genes within a single organism or across the same gene in several organisms and provide a common regulatory mechanism. Search algorithms, such as RNAMotif, can be used to mine nucleotide sequence databases for these repeating motifs. RNAMotif allows users to capture essential features of known structures in detailed descriptors and can be used to identify, with high specificity, other similar motifs within the nucleotide database. However, when the descriptor constraints are relaxed to provide more flexibility, or when there is very little a priori information about hypothesized RNA structures, the number of motif 'hits' may become very large. Exhaustive methods to search for similar RNA structures over these large search spaces are likely to be computationally intractable. Here we describe a powerful new algorithm based on evolutionary computation to solve this problem. A series of experiments using ferritin IRE and SRP RNA stem-loop motifs were used to verify the method. We demonstrate that even when searching extremely large search spaces, of the order of 10(23) potential solutions, we could find the correct solution in a fraction of the time it would have taken for exhaustive comparisons.
RNA分子折叠成独特的二级和三级结构,这些结构决定了它们多样的功能活性。许多这样的RNA结构,或其中某些结构基序,被认为会在单个生物体的多个基因中反复出现,或者在几种生物体的同一基因中出现,并提供一种共同的调控机制。搜索算法,如RNAMotif,可用于在核苷酸序列数据库中挖掘这些重复基序。RNAMotif允许用户在详细描述符中捕捉已知结构的基本特征,并可用于以高特异性识别核苷酸数据库中的其他相似基序。然而,当放宽描述符约束以提供更多灵活性时,或者当关于假设的RNA结构的先验信息非常少时,基序“命中”的数量可能会变得非常大。在这些大搜索空间中搜索相似RNA结构的穷举方法可能在计算上难以处理。在这里,我们描述了一种基于进化计算的强大新算法来解决这个问题。使用铁蛋白IRE和SRP RNA茎环基序进行了一系列实验来验证该方法。我们证明,即使在搜索极大的搜索空间(约10^23个潜在解决方案)时,我们也能在进行穷举比较所需时间的一小部分内找到正确的解决方案。