Qiu Shibin, Lane Terran
Pathwork Diagnostics, Inc, Sunnyvale, Sunnyvale, CA 94089, USA.
IEEE/ACM Trans Comput Biol Bioinform. 2009 Apr-Jun;6(2):190-9. doi: 10.1109/TCBB.2008.139.
The cell defense mechanism of RNA interference has applications in gene function analysis and promising potentials in human disease therapy. To effectively silence a target gene, it is desirable to select appropriate initiator siRNA molecules having satisfactory silencing capabilities. Computational prediction for silencing efficacy of siRNAs can assist this screening process before using them in biological experiments. String kernel functions, which operate directly on the string objects representing siRNAs and target mRNAs, have been applied to support vector regression for the prediction and improved accuracy over numerical kernels in multidimensional vector spaces constructed from descriptors of siRNA design rules. To fully utilize information provided by string and numerical data, we propose to unify the two in a kernel feature space by devising a multiple kernel regression framework where a linear combination of the kernels is used. We formulate the multiple kernel learning into a quadratically constrained quadratic programming (QCQP) problem, which although yields global optimal solution, is computationally demanding and requires a commercial solver package. We further propose three heuristics based on the principle of kernel-target alignment and predictive accuracy. Empirical results demonstrate that multiple kernel regression can improve accuracy, decrease model complexity by reducing the number of support vectors, and speed up computational performance dramatically. In addition, multiple kernel regression evaluates the importance of constituent kernels, which for the siRNA efficacy prediction problem, compares the relative significance of the design rules. Finally, we give insights into the multiple kernel regression mechanism and point out possible extensions.
RNA干扰的细胞防御机制在基因功能分析中具有应用价值,并且在人类疾病治疗方面有着广阔的潜力。为了有效地沉默靶基因,需要选择具有令人满意沉默能力的合适起始siRNA分子。在将siRNA用于生物学实验之前,对其沉默效果进行计算预测可以辅助这一筛选过程。字符串核函数直接作用于代表siRNA和靶mRNA的字符串对象,已被应用于支持向量回归进行预测,并且在由siRNA设计规则描述符构建的多维向量空间中比数值核具有更高的准确性。为了充分利用字符串和数值数据提供的信息,我们建议通过设计一个使用核的线性组合的多核回归框架,在核特征空间中将两者统一起来。我们将多核学习公式化为一个二次约束二次规划(QCQP)问题,虽然该问题能产生全局最优解,但计算量很大,并且需要一个商业求解器包。我们进一步基于核-目标对齐和预测准确性的原则提出了三种启发式方法。实证结果表明,多核回归可以提高准确性,通过减少支持向量的数量降低模型复杂度,并显著加快计算性能。此外,多核回归评估组成核的重要性,对于siRNA功效预测问题,这比较了设计规则的相对重要性。最后,我们深入探讨了多核回归机制并指出了可能的扩展方向。