Yamada Tomoyuki, Morishita Shinichi
Department of Computational Biology, Graduate School of Frontier Sciences, University of Tokyo, Japan.
Bioinformatics. 2005 Apr 15;21(8):1316-24. doi: 10.1093/bioinformatics/bti155. Epub 2004 Nov 25.
Designing highly effective short interfering RNA (siRNA) sequences with maximum target-specificity for mammalian RNA interference (RNAi) is one of the hottest topics in molecular biology. The relationship between siRNA sequences and RNAi activity has been studied extensively to establish rules for selecting highly effective sequences. However, there is a pressing need to compute siRNA sequences that minimize off-target silencing effects efficiently and to match any non-targeted sequences with mismatches.
The enumeration of potential cross-hybridization candidates is non-trivial, because siRNA sequences are short, ca. 19 nt in length, and at least three mismatches with non-targets are required. With at least three mismatches, there are typically four or five contiguous matches, so that a BLAST search frequently overlooks off-target candidates. By contrast, existing accurate approaches are expensive to execute; thus we need to develop an accurate, efficient algorithm that uses seed hashing, the pigeonhole principle, and combinatorics to identify mismatch patterns. Tests show that our method can list potential cross-hybridization candidates for any siRNA sequence of selected human gene rapidly, outperforming traditional methods by orders of magnitude in terms of computational performance.
设计对哺乳动物RNA干扰(RNAi)具有最大靶标特异性的高效短干扰RNA(siRNA)序列是分子生物学中最热门的话题之一。人们对siRNA序列与RNAi活性之间的关系进行了广泛研究,以建立选择高效序列的规则。然而,迫切需要计算能够有效最小化脱靶沉默效应的siRNA序列,并将任何非靶向序列与错配进行匹配。
潜在交叉杂交候选序列的枚举并非易事,因为siRNA序列很短,长度约为19个核苷酸,并且需要与非靶标至少有三个错配。有了至少三个错配,通常会有四到五个连续匹配,因此BLAST搜索经常会忽略脱靶候选序列。相比之下,现有的精确方法执行成本高昂;因此,我们需要开发一种精确、高效的算法,利用种子哈希、鸽巢原理和组合数学来识别错配模式。测试表明,我们的方法可以快速列出所选人类基因的任何siRNA序列的潜在交叉杂交候选序列,在计算性能方面比传统方法高出几个数量级。