Zhao Wenzhong, Fanning M Leigh, Lane Terran
University of New Mexico, Department of Computer Science, Albuquerque, NM 87131-0001, USA.
Artif Intell Med. 2005 Sep-Oct;35(1-2):61-73. doi: 10.1016/j.artmed.2005.01.009.
We address the problem of selecting an efficient set of initiator molecules (siRNAs) for RNA interference (RNAi)-based gene family knockdown experiments. Our goal is to select a minimal set of siRNAs that (a) cover a targeted gene family or a specified subset of it, (b) do not cover any untargeted genes, and (c) are individually highly effective at inducing knockdown.
We give two formalizations of the gene family knockdown problem. First, we show that the problem, minimizing the number of siRNAs required to knock down a family of genes, is NP-Hard via a reduction to the set cover problem. Second, we generalize the basic problem to incorporate additional biological constraints and optimality criteria. To solve the resulting set-cover variants, we modify the classical branch-and-bound algorithm to include some of these biological criteria. We find that in many typical cases these constraints reduce the search space enough that we are able to compute exact minimal siRNA covers within reasonable time. For larger cases, we propose a probabilistic greedy algorithm for finding minimal siRNA covers efficiently. We apply our methods to two different gene families, the FREP genes from Biomphalaria glabrata and the olfactory genes from Caenorhabditis elegans.
Our computational results on real biological data show that the probabilistic greedy algorithm produces siRNA covers as good as the branch-and-bound algorithm in most cases. Both algorithms return minimal siRNA covers with high predicted probability that the selected siRNAs will be effective at inducing knockdown. We also examine the role of "off-target" interactions: the constraint of avoiding covering untargeted genes can, in some cases, substantially increase the complexity of the resulting solution. Overall, we find that in many common cases our approach significantly reduces the number of siRNAs required in gene family knockdown experiments, as compared to knocking down genes independently.
我们致力于解决在基于RNA干扰(RNAi)的基因家族敲低实验中选择一组高效起始分子(siRNA)的问题。我们的目标是选择一组最小的siRNA,其要满足以下条件:(a)覆盖目标基因家族或其特定子集;(b)不覆盖任何非目标基因;(c)个体在诱导敲低方面具有高效性。
我们对基因家族敲低问题给出了两种形式化表述。首先,我们通过归约到集合覆盖问题表明,使敲低一组基因所需的siRNA数量最小化的问题是NP难问题。其次,我们对基本问题进行了推广,纳入了额外的生物学约束和最优标准。为了解决由此产生的集合覆盖变体问题,我们对经典的分支定界算法进行了修改,以纳入其中一些生物学标准。我们发现,在许多典型情况下,这些约束充分减少了搜索空间,使我们能够在合理时间内计算出精确的最小siRNA覆盖集。对于更大规模的情况,我们提出了一种概率贪心算法,以高效地找到最小siRNA覆盖集。我们将我们的方法应用于两个不同的基因家族,即来自光滑双脐螺的FREP基因和来自秀丽隐杆线虫的嗅觉基因。
我们对真实生物学数据的计算结果表明,在大多数情况下,概率贪心算法产生的siRNA覆盖集与分支定界算法的效果相当。两种算法都返回最小的siRNA覆盖集,且所选siRNA诱导敲低有效的预测概率很高。我们还研究了“脱靶”相互作用的作用:避免覆盖非目标基因的约束在某些情况下会显著增加所得解决方案的复杂性。总体而言,我们发现,与独立敲低基因相比,在许多常见情况下,我们的方法显著减少了基因家族敲低实验所需的siRNA数量。