Department of Computer Science, Helsinki Institute for Information Technology HIIT, Aalto University, Espoo, Finland.
Institute for Molecular Medicine Finland FIMM, University of Helsinki, Helsinki, Finland.
Bioinformatics. 2018 Jul 1;34(13):i509-i518. doi: 10.1093/bioinformatics/bty277.
Many inference problems in bioinformatics, including drug bioactivity prediction, can be formulated as pairwise learning problems, in which one is interested in making predictions for pairs of objects, e.g. drugs and their targets. Kernel-based approaches have emerged as powerful tools for solving problems of that kind, and especially multiple kernel learning (MKL) offers promising benefits as it enables integrating various types of complex biomedical information sources in the form of kernels, along with learning their importance for the prediction task. However, the immense size of pairwise kernel spaces remains a major bottleneck, making the existing MKL algorithms computationally infeasible even for small number of input pairs.
We introduce pairwiseMKL, the first method for time- and memory-efficient learning with multiple pairwise kernels. pairwiseMKL first determines the mixture weights of the input pairwise kernels, and then learns the pairwise prediction function. Both steps are performed efficiently without explicit computation of the massive pairwise matrices, therefore making the method applicable to solving large pairwise learning problems. We demonstrate the performance of pairwiseMKL in two related tasks of quantitative drug bioactivity prediction using up to 167 995 bioactivity measurements and 3120 pairwise kernels: (i) prediction of anticancer efficacy of drug compounds across a large panel of cancer cell lines; and (ii) prediction of target profiles of anticancer compounds across their kinome-wide target spaces. We show that pairwiseMKL provides accurate predictions using sparse solutions in terms of selected kernels, and therefore it automatically identifies also data sources relevant for the prediction problem.
Code is available at https://github.com/aalto-ics-kepaco.
Supplementary data are available at Bioinformatics online.
许多生物信息学中的推理问题,包括药物生物活性预测,可以被表述为成对学习问题,人们有兴趣对对象对(例如药物与其靶标)进行预测。基于核的方法已经成为解决这类问题的强大工具,尤其是多核学习(MKL)提供了有希望的好处,因为它能够以核的形式整合各种类型的复杂生物医学信息源,并学习它们对预测任务的重要性。然而,成对核空间的巨大规模仍然是一个主要的瓶颈,使得现有的 MKL 算法即使对于少量的输入对也在计算上不可行。
我们引入了 pairwiseMKL,这是一种用于具有多个成对核的时间和内存高效学习的第一个方法。pairwiseMKL 首先确定输入成对核的混合权重,然后学习成对预测函数。这两个步骤都是在不明确计算大规模成对矩阵的情况下高效地执行的,因此使得该方法适用于解决大型成对学习问题。我们在使用多达 167995 个生物活性测量值和 3120 个成对核的两个相关任务中展示了 pairwiseMKL 的性能:(i)在大量癌细胞系中预测药物化合物的抗癌疗效;(ii)在激酶组范围内预测抗癌化合物的靶标谱。我们表明,pairwiseMKL 使用所选核的稀疏解决方案提供了准确的预测,因此它自动识别了与预测问题相关的数据来源。
代码可在 https://github.com/aalto-ics-kepaco 获得。
补充数据可在生物信息学在线获得。