• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

利用字符串核中的物理化学性质。

Exploiting physico-chemical properties in string kernels.

机构信息

Center for Bioinformatics, Eberhard-Karls-Universität, Sand 14, 72076 Tübingen, Germany.

出版信息

BMC Bioinformatics. 2010 Oct 26;11 Suppl 8(Suppl 8):S7. doi: 10.1186/1471-2105-11-S8-S7.

DOI:10.1186/1471-2105-11-S8-S7
PMID:21034432
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC2966294/
Abstract

BACKGROUND

String kernels are commonly used for the classification of biological sequences, nucleotide as well as amino acid sequences. Although string kernels are already very powerful, when it comes to amino acids they have a major short coming. They ignore an important piece of information when comparing amino acids: the physico-chemical properties such as size, hydrophobicity, or charge. This information is very valuable, especially when training data is less abundant. There have been only very few approaches so far that aim at combining these two ideas.

RESULTS

We propose new string kernels that combine the benefits of physico-chemical descriptors for amino acids with the ones of string kernels. The benefits of the proposed kernels are assessed on two problems: MHC-peptide binding classification using position specific kernels and protein classification based on the substring spectrum of the sequences. Our experiments demonstrate that the incorporation of amino acid properties in string kernels yields improved performances compared to standard string kernels and to previously proposed non-substring kernels.

CONCLUSIONS

In summary, the proposed modifications, in particular the combination with the RBF substring kernel, consistently yield improvements without affecting the computational complexity. The proposed kernels therefore appear to be the kernels of choice for any protein sequence-based inference.

AVAILABILITY

Data sets, code and additional information are available from http://www.fml.tuebingen.mpg.de/raetsch/suppl/aask. Implementations of the developed kernels are available as part of the Shogun toolbox.

摘要

背景

字符串核函数常用于生物序列(核苷酸和氨基酸序列)的分类。尽管字符串核函数已经非常强大,但在处理氨基酸时,它们存在一个主要的缺点。在比较氨基酸时,它们忽略了一个重要信息:理化性质,如大小、疏水性或电荷。这些信息非常有价值,特别是在训练数据较少的情况下。到目前为止,只有极少数方法旨在结合这两个想法。

结果

我们提出了新的字符串核函数,将氨基酸的理化描述符的优势与字符串核函数的优势相结合。所提出的核函数的优势在两个问题上进行了评估:基于位置特定核函数的 MHC-肽结合分类和基于序列子串谱的蛋白质分类。我们的实验表明,将氨基酸性质纳入字符串核函数可提高性能,与标准字符串核函数和之前提出的非子串核函数相比。

结论

总之,所提出的修改,特别是与 RBF 子串核函数的结合,在不影响计算复杂度的情况下,始终能提高性能。因此,所提出的核函数似乎是任何基于蛋白质序列的推理的首选核函数。

可用性

数据集、代码和其他信息可从 http://www.fml.tuebingen.mpg.de/raetsch/suppl/aask 获得。所开发的核函数的实现可作为 Shogun 工具箱的一部分获得。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/116d/2966294/56dbab950869/1471-2105-11-S8-S7-2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/116d/2966294/36ab6cfbbec7/1471-2105-11-S8-S7-1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/116d/2966294/56dbab950869/1471-2105-11-S8-S7-2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/116d/2966294/36ab6cfbbec7/1471-2105-11-S8-S7-1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/116d/2966294/56dbab950869/1471-2105-11-S8-S7-2.jpg

相似文献

1
Exploiting physico-chemical properties in string kernels.利用字符串核中的物理化学性质。
BMC Bioinformatics. 2010 Oct 26;11 Suppl 8(Suppl 8):S7. doi: 10.1186/1471-2105-11-S8-S7.
2
Semi-supervised protein classification using cluster kernels.使用聚类核的半监督蛋白质分类
Bioinformatics. 2005 Aug 1;21(15):3241-7. doi: 10.1093/bioinformatics/bti497. Epub 2005 May 19.
3
A weighted string kernel for protein fold recognition.一种用于蛋白质折叠识别的加权字符串核。
BMC Bioinformatics. 2017 Aug 25;18(1):378. doi: 10.1186/s12859-017-1795-5.
4
Protein homology detection using string alignment kernels.使用字符串比对核进行蛋白质同源性检测。
Bioinformatics. 2004 Jul 22;20(11):1682-9. doi: 10.1093/bioinformatics/bth141. Epub 2004 Feb 26.
5
Learned random-walk kernels and empirical-map kernels for protein sequence classification.用于蛋白质序列分类的学习型随机游走核和经验映射核。
J Comput Biol. 2009 Mar;16(3):457-74. doi: 10.1089/cmb.2008.0031.
6
Mismatch string kernels for discriminative protein classification.用于判别式蛋白质分类的错配字符串核
Bioinformatics. 2004 Mar 1;20(4):467-76. doi: 10.1093/bioinformatics/btg431. Epub 2004 Jan 22.
7
Profile-based string kernels for remote homology detection and motif extraction.基于轮廓的字符串核用于远程同源性检测和基序提取。
J Bioinform Comput Biol. 2005 Jun;3(3):527-50. doi: 10.1142/s021972000500120x.
8
High performance set of PseAAC and sequence based descriptors for protein classification.用于蛋白质分类的高性能 PseAAC 和基于序列的描述符集。
J Theor Biol. 2010 Sep 7;266(1):1-10. doi: 10.1016/j.jtbi.2010.06.006. Epub 2010 Jun 15.
9
SVM-Fold: a tool for discriminative multi-class protein fold and superfamily recognition.支持向量机折叠法:一种用于判别式多类别蛋白质折叠和超家族识别的工具。
BMC Bioinformatics. 2007 May 22;8 Suppl 4(Suppl 4):S2. doi: 10.1186/1471-2105-8-S4-S2.
10
Application of string kernels in protein sequence classification.字符串核在蛋白质序列分类中的应用。
Appl Bioinformatics. 2005;4(1):45-52. doi: 10.2165/00822942-200504010-00005.

引用本文的文献

1
On learning functions over biological sequence space: relating Gaussian process priors, regularization, and gauge fixing.关于生物序列空间上的学习函数:关联高斯过程先验、正则化和规范固定。
bioRxiv. 2025 Jul 11:2025.04.26.650699. doi: 10.1101/2025.04.26.650699.
2
On learning functions over biological sequence space: relating Gaussian process priors, regularization, and gauge fixing.关于生物序列空间上的学习函数:关联高斯过程先验、正则化和规范固定。
ArXiv. 2025 Jul 11:arXiv:2504.19034v2.
3
Encodings and models for antimicrobial peptide classification for multi-resistant pathogens.

本文引用的文献

1
Combining structure and sequence information allows automated prediction of substrate specificities within enzyme families.结合结构和序列信息可以实现酶家族中底物特异性的自动预测。
PLoS Comput Biol. 2010 Jan 8;6(1):e1000636. doi: 10.1371/journal.pcbi.1000636.
2
mGene: accurate SVM-based gene finding with an application to nematode genomes.mGene:基于 SVM 的精确基因预测方法及其在线虫基因组中的应用。
Genome Res. 2009 Nov;19(11):2133-43. doi: 10.1101/gr.090597.108. Epub 2009 Jun 29.
3
KIRMES: kernel-based identification of regulatory modules in euchromatic sequences.
用于多重耐药病原体抗菌肽分类的编码与模型
BioData Min. 2019 Mar 4;12:7. doi: 10.1186/s13040-019-0196-x. eCollection 2019.
4
A weighted string kernel for protein fold recognition.一种用于蛋白质折叠识别的加权字符串核。
BMC Bioinformatics. 2017 Aug 25;18(1):378. doi: 10.1186/s12859-017-1795-5.
5
Maximum margin classifier working in a set of strings.在一组字符串中工作的最大间隔分类器。
Proc Math Phys Eng Sci. 2016 Mar;472(2187):20150551. doi: 10.1098/rspa.2015.0551.
6
Machine learning assisted design of highly active peptides for drug discovery.用于药物发现的高活性肽的机器学习辅助设计。
PLoS Comput Biol. 2015 Apr 7;11(4):e1004074. doi: 10.1371/journal.pcbi.1004074. eCollection 2015 Apr.
7
MHC2SKpan: a novel kernel based approach for pan-specific MHC class II peptide binding prediction.MHC2SKpan:一种基于核的新型泛特异性MHC II类肽结合预测方法。
BMC Genomics. 2013;14 Suppl 5(Suppl 5):S11. doi: 10.1186/1471-2164-14-S5-S11. Epub 2013 Oct 16.
8
Accelerating the Original Profile Kernel.加速原始轮廓内核。
PLoS One. 2013 Jun 18;8(6):e68459. doi: 10.1371/journal.pone.0068459. Print 2013.
9
Learning a peptide-protein binding affinity predictor with kernel ridge regression.用核脊回归学习肽-蛋白结合亲和力预测器。
BMC Bioinformatics. 2013 Mar 5;14:82. doi: 10.1186/1471-2105-14-82.
10
Exploring sequence characteristics related to high-level production of secreted proteins in Aspergillus niger.探讨与黑曲霉中分泌蛋白高水平生产相关的序列特征。
PLoS One. 2012;7(10):e45869. doi: 10.1371/journal.pone.0045869. Epub 2012 Oct 1.
KIRMES:基于核的常染色质序列调控模块识别。
Bioinformatics. 2009 Aug 15;25(16):2126-33. doi: 10.1093/bioinformatics/btp278. Epub 2009 Apr 23.
4
POIMs: positional oligomer importance matrices--understanding support vector machine-based signal detectors.位置寡聚物重要性矩阵(POIMs):理解基于支持向量机的信号检测器
Bioinformatics. 2008 Jul 1;24(13):i6-14. doi: 10.1093/bioinformatics/btn170.
5
Physicochemical feature-based classification of amino acid mutations.基于物理化学特征的氨基酸突变分类
Protein Eng Des Sel. 2008 Jan;21(1):37-44. doi: 10.1093/protein/gzm084. Epub 2007 Dec 19.
6
Efficient peptide-MHC-I binding prediction for alleles with few known binders.针对已知结合肽较少的等位基因进行高效的肽-MHC-I结合预测。
Bioinformatics. 2008 Feb 1;24(3):358-66. doi: 10.1093/bioinformatics/btm611. Epub 2007 Dec 14.
7
Common sequence polymorphisms shaping genetic diversity in Arabidopsis thaliana.塑造拟南芥遗传多样性的常见序列多态性。
Science. 2007 Jul 20;317(5836):338-42. doi: 10.1126/science.1138632.
8
Improved functional prediction of proteins by learning kernel combinations in multilabel settings.通过在多标签设置中学习核组合改进蛋白质的功能预测
BMC Bioinformatics. 2007 May 3;8 Suppl 2(Suppl 2):S12. doi: 10.1186/1471-2105-8-S2-S12.
9
POPI: predicting immunogenicity of MHC class I binding peptides by mining informative physicochemical properties.POPI:通过挖掘信息丰富的物理化学性质预测MHC I类结合肽的免疫原性
Bioinformatics. 2007 Apr 15;23(8):942-9. doi: 10.1093/bioinformatics/btm061. Epub 2007 Mar 24.
10
Improving the Caenorhabditis elegans genome annotation using machine learning.利用机器学习改进秀丽隐杆线虫基因组注释
PLoS Comput Biol. 2007 Feb 23;3(2):e20. doi: 10.1371/journal.pcbi.0030020. Epub 2006 Dec 21.