加速原始轮廓内核。

Accelerating the Original Profile Kernel.

作者信息

Hamp Tobias, Goldberg Tatyana, Rost Burkhard

机构信息

Bioinformatics & Computational Biology - I12, Department of Informatics, Technical University of Munich, Garching/Munich, Germany.

出版信息

PLoS One. 2013 Jun 18;8(6):e68459. doi: 10.1371/journal.pone.0068459. Print 2013.

DOI:10.1371/journal.pone.0068459

PMID:23825697

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC3688983/

Abstract

One of the most accurate multi-class protein classification systems continues to be the profile-based SVM kernel introduced by the Leslie group. Unfortunately, its CPU requirements render it too slow for practical applications of large-scale classification tasks. Here, we introduce several software improvements that enable significant acceleration. Using various non-redundant data sets, we demonstrate that our new implementation reaches a maximal speed-up as high as 14-fold for calculating the same kernel matrix. Some predictions are over 200 times faster and render the kernel as possibly the top contender in a low ratio of speed/performance. Additionally, we explain how to parallelize various computations and provide an integrative program that reduces creating a production-quality classifier to a single program call. The new implementation is available as a Debian package under a free academic license and does not depend on commercial software. For non-Debian based distributions, the source package ships with a traditional Makefile-based installer. Download and installation instructions can be found at https://rostlab.org/owiki/index.php/Fast_Profile_Kernel. Bugs and other issues may be reported at https://rostlab.org/bugzilla3/enter_bug.cgi?product=fastprofkernel.

摘要

最精确的多类别蛋白质分类系统之一仍然是莱斯利团队引入的基于轮廓的支持向量机内核。不幸的是，其对CPU的要求使得它在大规模分类任务的实际应用中速度过慢。在此，我们介绍了几种能实现显著加速的软件改进方法。使用各种非冗余数据集，我们证明，对于计算相同的内核矩阵，我们的新实现方式能达到高达14倍的最大加速比。一些预测速度提高了200多倍，使该内核可能成为速度/性能比很低情况下的顶级竞争者。此外，我们解释了如何并行化各种计算，并提供了一个集成程序，将创建一个生产质量的分类器简化为单个程序调用。新实现方式以Debian包的形式提供，遵循免费学术许可，且不依赖商业软件。对于非基于Debian的发行版，源包附带一个基于传统Makefile的安装程序。下载和安装说明可在https://rostlab.org/owiki/index.php/Fast_Profile_Kernel找到。错误和其他问题可在https://rostlab.org/bugzilla3/enter_bug.cgi?product=fastprofkernel报告。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5c3b/3688983/529d921c0ffb/pone.0068459.g001.jpg

相似文献

Accelerating the Original Profile Kernel.加速原始轮廓内核。

PLoS One. 2013 Jun 18;8(6):e68459. doi: 10.1371/journal.pone.0068459. Print 2013.

LZW-Kernel: fast kernel utilizing variable length code blocks from LZW compressors for protein sequence classification.LZW-Kernel：快速内核，利用 LZW 压缩器中的变长码块对蛋白质序列进行分类。

Bioinformatics. 2018 Oct 1;34(19):3281-3288. doi: 10.1093/bioinformatics/bty349.

gkmSVM: an R package for gapped-kmer SVM.gkmSVM：一个用于带间隔k-mer支持向量机的R软件包。

Bioinformatics. 2016 Jul 15;32(14):2205-7. doi: 10.1093/bioinformatics/btw203. Epub 2016 Apr 19.

FastSK: fast sequence analysis with gapped string kernels.FastSK：使用带间隙字符串核的快速序列分析。

Bioinformatics. 2020 Dec 30;36(Suppl_2):i857-i865. doi: 10.1093/bioinformatics/btaa817.

SVM-Fold: a tool for discriminative multi-class protein fold and superfamily recognition.支持向量机折叠法：一种用于判别式多类别蛋白质折叠和超家族识别的工具。

BMC Bioinformatics. 2007 May 22;8 Suppl 4(Suppl 4):S2. doi: 10.1186/1471-2105-8-S4-S2.

Evolutionary profiles improve protein-protein interaction prediction from sequence.进化特征可提高基于序列的蛋白质-蛋白质相互作用预测。

Bioinformatics. 2015 Jun 15;31(12):1945-50. doi: 10.1093/bioinformatics/btv077. Epub 2015 Feb 4.

A Fast Reduced Kernel Extreme Learning Machine.一种快速简化核极限学习机。

Neural Netw. 2016 Apr;76:29-38. doi: 10.1016/j.neunet.2015.10.006. Epub 2016 Jan 6.

A multi-label learning based kernel automatic recommendation method for support vector machine.一种基于多标签学习的支持向量机核自动推荐方法。

PLoS One. 2015 Apr 20;10(3):e0120455. doi: 10.1371/journal.pone.0120455. eCollection 2015.

Direct Kernel Perceptron (DKP): ultra-fast kernel ELM-based classification with non-iterative closed-form weight calculation.直接核感知机（DKP）：基于超快速核极限学习机的分类方法，具有非迭代的闭式权重计算。

Neural Netw. 2014 Feb;50:60-71. doi: 10.1016/j.neunet.2013.11.002. Epub 2013 Nov 14.

SVM and SVM Ensembles in Breast Cancer Prediction.支持向量机及其集成方法在乳腺癌预测中的应用

PLoS One. 2017 Jan 6;12(1):e0161501. doi: 10.1371/journal.pone.0161501. eCollection 2017.

引用本文的文献

Protein embeddings and deep learning predict binding residues for various ligand classes.蛋白质嵌入和深度学习预测各种配体类的结合残基。

Sci Rep. 2021 Dec 13;11(1):23916. doi: 10.1038/s41598-021-03431-4.

PredictProtein - Predicting Protein Structure and Function for 29 Years.PredictProtein - 预测蛋白质结构和功能 29 年。

Nucleic Acids Res. 2021 Jul 2;49(W1):W535-W540. doi: 10.1093/nar/gkab354.

Combining learning and constraints for genome-wide protein annotation.联合学习与约束进行全基因组蛋白注释。

BMC Bioinformatics. 2019 Jun 17;20(1):338. doi: 10.1186/s12859-019-2875-5.

Detailed prediction of protein sub-nuclear localization.详细预测蛋白质亚核定位。

BMC Bioinformatics. 2019 Apr 23;20(1):205. doi: 10.1186/s12859-019-2790-9.

Computational prediction shines light on type III secretion origins.计算预测揭示了 III 型分泌系统的起源。

Sci Rep. 2016 Oct 7;6:34516. doi: 10.1038/srep34516.

LocTree3 prediction of localization.LocTree3 定位预测。

Nucleic Acids Res. 2014 Jul;42(Web Server issue):W350-5. doi: 10.1093/nar/gku396. Epub 2014 May 21.

本文引用的文献

Homology-based inference sets the bar high for protein function prediction.基于同源性的推断为蛋白质功能预测设定了很高的标准。

BMC Bioinformatics. 2013;14 Suppl 3(Suppl 3):S7. doi: 10.1186/1471-2105-14-S3-S7. Epub 2013 Feb 28.

A large-scale evaluation of computational protein function prediction.大规模计算蛋白质功能预测评估。

Nat Methods. 2013 Mar;10(3):221-7. doi: 10.1038/nmeth.2340. Epub 2013 Jan 27.

LocTree2 predicts localization for all domains of life.LocTree2 可预测所有生命领域的定位。

Bioinformatics. 2012 Sep 15;28(18):i458-i465. doi: 10.1093/bioinformatics/bts390.

HHblits: lightning-fast iterative protein sequence searching by HMM-HMM alignment.HHblits：通过 HMM-HMM 比对进行快速迭代的蛋白质序列搜索。

Nat Methods. 2011 Dec 25;9(2):173-5. doi: 10.1038/nmeth.1818.

Improving structure alignment-based prediction of SCOP families using Vorolign kernels.利用 Vorolign 核改进基于结构比对的 SCOP 家族预测。

Bioinformatics. 2011 Jan 15;27(2):204-10. doi: 10.1093/bioinformatics/btq618. Epub 2010 Nov 18.

Ongoing and future developments at the Universal Protein Resource.通用蛋白质资源的当前及未来发展情况。

Nucleic Acids Res. 2011 Jan;39(Database issue):D214-9. doi: 10.1093/nar/gkq1020. Epub 2010 Nov 4.

Exploiting physico-chemical properties in string kernels.利用字符串核中的物理化学性质。

BMC Bioinformatics. 2010 Oct 26;11 Suppl 8(Suppl 8):S7. doi: 10.1186/1471-2105-11-S8-S7.

A discriminative method for protein remote homology detection and fold recognition combining Top-n-grams and latent semantic analysis.一种结合Top-n-grams和潜在语义分析的蛋白质远程同源性检测与折叠识别的判别方法。

BMC Bioinformatics. 2008 Dec 1;9:510. doi: 10.1186/1471-2105-9-510.

PairProSVM: protein subcellular localization based on local pairwise profile alignment and SVM.PairProSVM：基于局部两两轮廓比对和支持向量机的蛋白质亚细胞定位

IEEE/ACM Trans Comput Biol Bioinform. 2008 Jul-Sep;5(3):416-22. doi: 10.1109/TCBB.2007.70256.

SVM-Fold: a tool for discriminative multi-class protein fold and superfamily recognition.支持向量机折叠法：一种用于判别式多类别蛋白质折叠和超家族识别的工具。

BMC Bioinformatics. 2007 May 22;8 Suppl 4(Suppl 4):S2. doi: 10.1186/1471-2105-8-S4-S2.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

加速原始轮廓内核。

Accelerating the Original Profile Kernel.

作者信息

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献