Hamp Tobias, Goldberg Tatyana, Rost Burkhard
Bioinformatics & Computational Biology - I12, Department of Informatics, Technical University of Munich, Garching/Munich, Germany.
PLoS One. 2013 Jun 18;8(6):e68459. doi: 10.1371/journal.pone.0068459. Print 2013.
One of the most accurate multi-class protein classification systems continues to be the profile-based SVM kernel introduced by the Leslie group. Unfortunately, its CPU requirements render it too slow for practical applications of large-scale classification tasks. Here, we introduce several software improvements that enable significant acceleration. Using various non-redundant data sets, we demonstrate that our new implementation reaches a maximal speed-up as high as 14-fold for calculating the same kernel matrix. Some predictions are over 200 times faster and render the kernel as possibly the top contender in a low ratio of speed/performance. Additionally, we explain how to parallelize various computations and provide an integrative program that reduces creating a production-quality classifier to a single program call. The new implementation is available as a Debian package under a free academic license and does not depend on commercial software. For non-Debian based distributions, the source package ships with a traditional Makefile-based installer. Download and installation instructions can be found at https://rostlab.org/owiki/index.php/Fast_Profile_Kernel. Bugs and other issues may be reported at https://rostlab.org/bugzilla3/enter_bug.cgi?product=fastprofkernel.
最精确的多类别蛋白质分类系统之一仍然是莱斯利团队引入的基于轮廓的支持向量机内核。不幸的是,其对CPU的要求使得它在大规模分类任务的实际应用中速度过慢。在此,我们介绍了几种能实现显著加速的软件改进方法。使用各种非冗余数据集,我们证明,对于计算相同的内核矩阵,我们的新实现方式能达到高达14倍的最大加速比。一些预测速度提高了200多倍,使该内核可能成为速度/性能比很低情况下的顶级竞争者。此外,我们解释了如何并行化各种计算,并提供了一个集成程序,将创建一个生产质量的分类器简化为单个程序调用。新实现方式以Debian包的形式提供,遵循免费学术许可,且不依赖商业软件。对于非基于Debian的发行版,源包附带一个基于传统Makefile的安装程序。下载和安装说明可在https://rostlab.org/owiki/index.php/Fast_Profile_Kernel找到。错误和其他问题可在https://rostlab.org/bugzilla3/enter_bug.cgi?product=fastprofkernel报告。