McQuisten Kyle A, Peek Andrew S
Department of Bioinformatics, Integrated DNA Technologies, Coralville, IA 52241, USA.
BMC Bioinformatics. 2007 Jun 7;8:184. doi: 10.1186/1471-2105-8-184.
Predicting the suppression activity of antisense oligonucleotide sequences is the main goal of the rational design of nucleic acids. To create an effective predictive model, it is important to know what properties of an oligonucleotide sequence associate significantly with antisense activity. Also, for the model to be efficient we must know what properties do not associate significantly and can be omitted from the model. This paper will discuss the results of a randomization procedure to find motifs that associate significantly with either high or low antisense suppression activity, analysis of their properties, as well as the results of support vector machine modelling using these significant motifs as features.
We discovered 155 motifs that associate significantly with high antisense suppression activity and 202 motifs that associate significantly with low suppression activity. The motifs range in length from 2 to 5 bases, contain several motifs that have been previously discovered as associating highly with antisense activity, and have thermodynamic properties consistent with previous work associating thermodynamic properties of sequences with their antisense activity. Statistical analysis revealed no correlation between a motif's position within an antisense sequence and that sequences antisense activity. Also, many significant motifs existed as subwords of other significant motifs. Support vector regression experiments indicated that the feature set of significant motifs increased correlation compared to all possible motifs as well as several subsets of the significant motifs.
The thermodynamic properties of the significantly associated motifs support existing data correlating the thermodynamic properties of the antisense oligonucleotide with antisense efficiency, reinforcing our hypothesis that antisense suppression is strongly associated with probe/target thermodynamics, as there are no enzymatic mediators to speed the process along like the RNA Induced Silencing Complex (RISC) in RNAi. The independence of motif position and antisense activity also allows us to bypass consideration of this feature in the modelling process, promoting model efficiency and reducing the chance of overfitting when predicting antisense activity. The increase in SVR correlation with significant features compared to nearest-neighbour features indicates that thermodynamics alone is likely not the only factor in determining antisense efficiency.
预测反义寡核苷酸序列的抑制活性是核酸合理设计的主要目标。为创建一个有效的预测模型,了解寡核苷酸序列的哪些特性与反义活性显著相关非常重要。此外,为使模型高效,我们必须知道哪些特性与反义活性无显著关联且可从模型中省略。本文将讨论随机化程序的结果,以寻找与高或低反义抑制活性显著相关的基序,分析它们的特性,以及使用这些显著基序作为特征的支持向量机建模结果。
我们发现了155个与高反义抑制活性显著相关的基序和202个与低抑制活性显著相关的基序。这些基序长度从2到5个碱基不等,包含几个先前已发现与反义活性高度相关的基序,并且具有与先前将序列的热力学性质与其反义活性相关联的工作一致的热力学性质。统计分析表明,基序在反义序列中的位置与其反义活性之间没有相关性。此外,许多显著基序作为其他显著基序的子词存在。支持向量回归实验表明,与所有可能的基序以及显著基序的几个子集相比,显著基序的特征集提高了相关性。
显著相关基序的热力学性质支持了将反义寡核苷酸的热力学性质与反义效率相关联的现有数据,强化了我们的假设,即反义抑制与探针/靶标热力学密切相关,因为没有像RNA干扰中的RNA诱导沉默复合体(RISC)那样的酶促介质来加速这一过程。基序位置与反义活性的独立性还使我们能够在建模过程中忽略对这一特征的考虑,提高模型效率并减少预测反义活性时过度拟合的可能性。与最近邻特征相比,支持向量回归与显著特征的相关性增加表明,仅热力学可能不是决定反义效率的唯一因素。