Suppr超能文献

gkmSVM:一个用于带间隔k-mer支持向量机的R软件包。

gkmSVM: an R package for gapped-kmer SVM.

作者信息

Ghandi Mahmoud, Mohammad-Noori Morteza, Ghareghani Narges, Lee Dongwon, Garraway Levi, Beer Michael A

机构信息

The Broad Institute of MIT and Harvard, Cambridge, MA, USA.

School of Mathematics, Statistics, and Computer Science, College of Science, University of Tehran, Tehran, Iran.

出版信息

Bioinformatics. 2016 Jul 15;32(14):2205-7. doi: 10.1093/bioinformatics/btw203. Epub 2016 Apr 19.

Abstract

UNLABELLED

We present a new R package for training gapped-kmer SVM classifiers for DNA and protein sequences. We describe an improved algorithm for kernel matrix calculation that speeds run time by about 2 to 5-fold over our original gkmSVM algorithm. This package supports several sequence kernels, including: gkmSVM, kmer-SVM, mismatch kernel and wildcard kernel.

AVAILABILITY AND IMPLEMENTATION

gkmSVM package is freely available through the Comprehensive R Archive Network (CRAN), for Linux, Mac OS and Windows platforms. The C ++ implementation is available at www.beerlab.org/gkmsvm

CONTACT

mghandi@gmail.com or mbeer@jhu.edu

SUPPLEMENTARY INFORMATION

Supplementary data are available at Bioinformatics online.

摘要

未标注

我们展示了一个新的R包,用于训练针对DNA和蛋白质序列的带间隙k-mer支持向量机(SVM)分类器。我们描述了一种改进的核矩阵计算算法,其运行时间比我们原来的gkmSVM算法快约2至5倍。该包支持多种序列核,包括:gkmSVM、kmer-SVM、错配核和通配符核。

可用性与实现

gkmSVM包可通过综合R存档网络(CRAN)免费获取,适用于Linux、Mac OS和Windows平台。C++实现可在www.beerlab.org/gkmsvm获取。

联系方式

mghandi@gmail.commbeer@jhu.edu

补充信息

补充数据可在《生物信息学》在线获取。

相似文献

1
gkmSVM: an R package for gapped-kmer SVM.
Bioinformatics. 2016 Jul 15;32(14):2205-7. doi: 10.1093/bioinformatics/btw203. Epub 2016 Apr 19.
2
FastSK: fast sequence analysis with gapped string kernels.
Bioinformatics. 2020 Dec 30;36(Suppl_2):i857-i865. doi: 10.1093/bioinformatics/btaa817.
3
LS-GKM: a new gkm-SVM for large-scale datasets.
Bioinformatics. 2016 Jul 15;32(14):2196-8. doi: 10.1093/bioinformatics/btw142. Epub 2016 Mar 15.
4
LZW-Kernel: fast kernel utilizing variable length code blocks from LZW compressors for protein sequence classification.
Bioinformatics. 2018 Oct 1;34(19):3281-3288. doi: 10.1093/bioinformatics/bty349.
5
QuasR: quantification and annotation of short reads in R.
Bioinformatics. 2015 Apr 1;31(7):1130-2. doi: 10.1093/bioinformatics/btu781. Epub 2014 Nov 21.
6
KODAMA: an R package for knowledge discovery and data mining.
Bioinformatics. 2017 Feb 15;33(4):621-623. doi: 10.1093/bioinformatics/btw705.
7
KeBABS: an R package for kernel-based analysis of biological sequences.
Bioinformatics. 2015 Aug 1;31(15):2574-6. doi: 10.1093/bioinformatics/btv176. Epub 2015 Mar 25.
8
HIGEDA: a hierarchical gene-set genetics based algorithm for finding subtle motifs in biological sequences.
Bioinformatics. 2010 Feb 1;26(3):302-9. doi: 10.1093/bioinformatics/btp676. Epub 2009 Dec 8.
9
MIToS.jl: mutual information tools for protein sequence analysis in the Julia language.
Bioinformatics. 2017 Feb 15;33(4):564-565. doi: 10.1093/bioinformatics/btw646.
10
LedPred: an R/bioconductor package to predict regulatory sequences using support vector machines.
Bioinformatics. 2016 Apr 1;32(7):1091-3. doi: 10.1093/bioinformatics/btv705. Epub 2015 Dec 1.

引用本文的文献

1
Functional Validation of Noncoding Variants Associated With Nonsyndromic Orofacial Cleft.
Hum Mutat. 2025 Aug 28;2025:6824122. doi: 10.1155/humu/6824122. eCollection 2025.
2
Machine learning tools for deciphering the regulatory logic of enhancers in health and disease.
Front Genet. 2025 Aug 13;16:1603687. doi: 10.3389/fgene.2025.1603687. eCollection 2025.
4
Combining Machine Learning and Multiplexed, Profiling to Engineer Cell Type and Behavioral Specificity.
bioRxiv. 2025 Jun 21:2025.06.20.660790. doi: 10.1101/2025.06.20.660790.
5
Analysis of biased allelic enhancer activity of schizophrenia-linked common variants.
Commun Biol. 2025 Jul 10;8(1):1034. doi: 10.1038/s42003-025-08456-3.
6
OptimDase: An Algorithm for Predicting DNA Binding Sites with Combined Feature Encoding.
Interdiscip Sci. 2025 Jun 10. doi: 10.1007/s12539-025-00704-8.
8
A KAN-based hybrid deep neural networks for accurate identification of transcription factor binding sites.
PLoS One. 2025 May 7;20(5):e0322978. doi: 10.1371/journal.pone.0322978. eCollection 2025.
10
Comprehensive analysis of computational approaches in plant transcription factors binding regions discovery.
Heliyon. 2024 Oct 10;10(20):e39140. doi: 10.1016/j.heliyon.2024.e39140. eCollection 2024 Oct 30.

本文引用的文献

1
LS-GKM: a new gkm-SVM for large-scale datasets.
Bioinformatics. 2016 Jul 15;32(14):2196-8. doi: 10.1093/bioinformatics/btw142. Epub 2016 Mar 15.
2
Epigenomic landscapes of retinal rods and cones.
Elife. 2016 Mar 7;5:e11613. doi: 10.7554/eLife.11613.
3
Identification of High-Impact cis-Regulatory Mutations Using Transcription Factor Specific Random Forest Models.
PLoS Comput Biol. 2015 Nov 12;11(11):e1004590. doi: 10.1371/journal.pcbi.1004590. eCollection 2015 Nov.
4
A method to predict the impact of regulatory variants from DNA sequence.
Nat Genet. 2015 Aug;47(8):955-61. doi: 10.1038/ng.3331. Epub 2015 Jun 15.
6
Enhanced regulatory sequence prediction using gapped k-mer features.
PLoS Comput Biol. 2014 Jul 17;10(7):e1003711. doi: 10.1371/journal.pcbi.1003711. eCollection 2014 Jul.
7
Robust k-mer frequency estimation using gapped k-mers.
J Math Biol. 2014 Aug;69(2):469-500. doi: 10.1007/s00285-013-0705-3. Epub 2013 Jul 17.
8
kmer-SVM: a web server for identifying predictive regulatory sequence features in genomic data sets.
Nucleic Acids Res. 2013 Jul;41(Web Server issue):W544-56. doi: 10.1093/nar/gkt519. Epub 2013 Jun 14.
9
10
Discriminative prediction of mammalian enhancers from DNA sequence.
Genome Res. 2011 Dec;21(12):2167-80. doi: 10.1101/gr.121905.111. Epub 2011 Aug 29.

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验