Laboratory of Theoretical Biophysics, School of Physical Science and Technology, Inner Mongolia University, Hohhot 010021, China.
Genomics. 2013 Oct;102(4):215-22. doi: 10.1016/j.ygeno.2013.07.009. Epub 2013 Jul 25.
For a successful RNA interference (RNAi) experiment, selecting the small interference RNA (siRNA) candidates which maximize the knock down effect of the given gene is the critical step. Although various computational approaches have been attempted, the design of efficient siRNA candidates is far from satisfactory yet. In this study, we proposed a novel feature selection algorithm of combined random forest and support vector machine to predict active siRNAs. Using a publically available dataset, we demonstrated that the predictive accuracy would be markedly improved when the context sequence features outside the target site were included. The Pearson correlation coefficient for regression is as high as 0.721, compared to 0.671, 0.668, 0.680, and 0.645, for Biopredsi, i-score, ThermoComposition21 and DSIR, respectively. It revealed that siRNA-target interaction requires appropriate sequence context not only in the target site but also in a broad region flanking the target site.
要成功进行 RNA 干扰 (RNAi) 实验,选择最大限度降低给定基因敲低效果的小干扰 RNA (siRNA) 候选物是关键步骤。尽管已经尝试了各种计算方法,但有效的 siRNA 候选物的设计远未令人满意。在这项研究中,我们提出了一种新的随机森林和支持向量机相结合的特征选择算法,用于预测活性 siRNA。使用公开可用的数据集,我们证明了当包括靶位点以外的上下文序列特征时,预测准确性会显著提高。回归的 Pearson 相关系数高达 0.721,而 Biopredsi、i-score、ThermoComposition21 和 DSIR 的 Pearson 相关系数分别为 0.671、0.668、0.680 和 0.645。这表明 siRNA-靶相互作用不仅需要靶位点内,还需要靶位点侧翼的广泛区域中的适当序列上下文。