Gevertz Jana, Gan Hin Hark, Schlick Tamar
Summer Undergraduate Research Program, New York University School of Medicine, New York, 10003, USA.
RNA. 2005 Jun;11(6):853-63. doi: 10.1261/rna.7271405.
In vitro selection of functional RNAs from large random sequence pools has led to the identification of many ligand-binding and catalytic RNAs. However, the structural diversity in random pools is not well understood. Such an understanding is a prerequisite for designing sequence pools to increase the probability of finding complex functional RNA by in vitro selection techniques. Toward this goal, we have generated by computer five random pools of RNA sequences of length up to 100 nt to mimic experiments and characterized the distribution of associated secondary structural motifs using sets of possible RNA tree structures derived from graph theory techniques. Our results show that such random pools heavily favor simple topological structures: For example, linear stem-loop and low-branching motifs are favored rather than complex structures with high-order junctions, as confirmed by known aptamers. Moreover, we quantify the rise of structural complexity with sequence length and report the dominant class of tree motifs (characterized by vertex number) for each pool. These analyses show not only that random pools do not lead to a uniform distribution of possible RNA secondary topologies; they point to avenues for designing pools with specific simple and complex structures in equal abundance in the goal of broadening the range of functional RNAs discovered by in vitro selection. Specifically, the optimal RNA sequence pool length to identify a structure with x stems is 20x.
从大量随机序列库中进行功能性RNA的体外筛选已导致许多配体结合RNA和催化RNA的鉴定。然而,随机库中的结构多样性尚未得到很好的理解。这种理解是设计序列库以提高通过体外筛选技术找到复杂功能性RNA概率的先决条件。为了实现这一目标,我们通过计算机生成了五个长度达100个核苷酸的RNA序列随机库以模拟实验,并使用源自图论技术的可能RNA树形结构集来表征相关二级结构基序的分布。我们的结果表明,此类随机库严重倾向于简单的拓扑结构:例如,线性茎环和低分支基序受到青睐,而非具有高阶连接的复杂结构,已知适体也证实了这一点。此外,我们量化了结构复杂性随序列长度的增加,并报告了每个库中主要的树形基序类别(以顶点数量表征)。这些分析不仅表明随机库不会导致可能的RNA二级拓扑结构的均匀分布;它们还指出了设计具有特定简单和复杂结构且丰度相等的库的途径,目标是拓宽通过体外筛选发现的功能性RNA的范围。具体而言,识别具有x个茎的结构的最佳RNA序列库长度为20x。