• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

用于远程同源性检测和折叠识别的基于轮廓的直接内核。

Profile-based direct kernels for remote homology detection and fold recognition.

作者信息

Rangwala Huzefa, Karypis George

机构信息

Department of Computer Science and Engineering, University of Minnesota Minneapolis, MN 55455, USA.

出版信息

Bioinformatics. 2005 Dec 1;21(23):4239-47. doi: 10.1093/bioinformatics/bti687. Epub 2005 Sep 27.

DOI:10.1093/bioinformatics/bti687
PMID:16188929
Abstract

MOTIVATION

Protein remote homology detection is a central problem in computational biology. Supervised learning algorithms based on support vector machines are currently one of the most effective methods for remote homology detection. The performance of these methods depends on how the protein sequences are modeled and on the method used to compute the kernel function between them.

RESULTS

We introduce two classes of kernel functions that are constructed by combining sequence profiles with new and existing approaches for determining the similarity between pairs of protein sequences. These kernels are constructed directly from these explicit protein similarity measures and employ effective profile-to-profile scoring schemes for measuring the similarity between pairs of proteins. Experiments with remote homology detection and fold recognition problems show that these kernels are capable of producing results that are substantially better than those produced by all of the existing state-of-the-art SVM-based methods. In addition, the experiments show that these kernels, even when used in the absence of profiles, produce results that are better than those produced by existing non-profile-based schemes.

AVAILABILITY

The programs for computing the various kernel functions are available on request from the authors.

摘要

动机

蛋白质远程同源性检测是计算生物学中的核心问题。基于支持向量机的监督学习算法是目前进行远程同源性检测最有效的方法之一。这些方法的性能取决于蛋白质序列的建模方式以及用于计算它们之间核函数的方法。

结果

我们引入了两类核函数,它们是通过将序列谱与用于确定蛋白质序列对之间相似性的新方法和现有方法相结合而构建的。这些核函数直接从这些明确的蛋白质相似性度量构建而成,并采用有效的谱对谱评分方案来测量蛋白质对之间的相似性。对远程同源性检测和折叠识别问题的实验表明,这些核函数能够产生比所有现有的基于支持向量机的先进方法所产生的结果显著更好的结果。此外,实验表明,这些核函数即使在没有谱的情况下使用,产生的结果也比现有的基于非谱的方案更好。

可用性

可应作者要求提供计算各种核函数的程序。

相似文献

1
Profile-based direct kernels for remote homology detection and fold recognition.用于远程同源性检测和折叠识别的基于轮廓的直接内核。
Bioinformatics. 2005 Dec 1;21(23):4239-47. doi: 10.1093/bioinformatics/bti687. Epub 2005 Sep 27.
2
SVM-HUSTLE--an iterative semi-supervised machine learning approach for pairwise protein remote homology detection.SVM-HUSTLE——一种用于成对蛋白质远程同源性检测的迭代半监督机器学习方法。
Bioinformatics. 2008 Mar 15;24(6):783-90. doi: 10.1093/bioinformatics/btn028. Epub 2008 Feb 1.
3
Fast model-based protein homology detection without alignment.基于快速模型的无需比对的蛋白质同源性检测。
Bioinformatics. 2007 Jul 15;23(14):1728-36. doi: 10.1093/bioinformatics/btm247. Epub 2007 May 8.
4
Support vector machines with profile-based kernels for remote protein homology detection.用于远程蛋白质同源性检测的基于轮廓核的支持向量机。
Genome Inform. 2004;15(2):191-200.
5
Remote homology detection based on oligomer distances.基于寡聚体距离的远程同源性检测。
Bioinformatics. 2006 Sep 15;22(18):2224-31. doi: 10.1093/bioinformatics/btl376. Epub 2006 Jul 12.
6
A structural alignment kernel for protein structures.一种用于蛋白质结构的结构比对核。
Bioinformatics. 2007 May 1;23(9):1090-8. doi: 10.1093/bioinformatics/btl642. Epub 2007 Jan 18.
7
Mismatch string kernels for discriminative protein classification.用于判别式蛋白质分类的错配字符串核
Bioinformatics. 2004 Mar 1;20(4):467-76. doi: 10.1093/bioinformatics/btg431. Epub 2004 Jan 22.
8
Probabilistic multi-class multi-kernel learning: on protein fold recognition and remote homology detection.概率多类多核学习:用于蛋白质折叠识别和远程同源性检测
Bioinformatics. 2008 May 15;24(10):1264-70. doi: 10.1093/bioinformatics/btn112. Epub 2008 Mar 31.
9
Protein remote homology detection based on auto-cross covariance transformation.基于自交协方差变换的蛋白质远程同源检测。
Comput Biol Med. 2011 Aug;41(8):640-7. doi: 10.1016/j.compbiomed.2011.05.015. Epub 2011 Jun 12.
10
Remote protein homology detection and fold recognition using two-layer support vector machine classifiers.使用两层支持向量机分类器进行远程蛋白质同源检测和折叠识别。
Comput Biol Med. 2011 Aug;41(8):687-99. doi: 10.1016/j.compbiomed.2011.06.004. Epub 2011 Jun 25.

引用本文的文献

1
Deep Learning for Genomics: From Early Neural Nets to Modern Large Language Models.深度学习在基因组学中的应用:从早期神经网络到现代大型语言模型。
Int J Mol Sci. 2023 Nov 1;24(21):15858. doi: 10.3390/ijms242115858.
2
BioSeq-BLM: a platform for analyzing DNA, RNA and protein sequences based on biological language models.BioSeq-BLM:一个基于生物语言模型分析 DNA、RNA 和蛋白质序列的平台。
Nucleic Acids Res. 2021 Dec 16;49(22):e129. doi: 10.1093/nar/gkab829.
3
Remote homology clustering identifies lowly conserved families of effector proteins in plant-pathogenic fungi.
远程同源聚类鉴定植物病原真菌中低保守效应蛋白家族。
Microb Genom. 2021 Sep;7(9). doi: 10.1099/mgen.0.000637.
4
The parallelism motifs of genomic data analysis.基因组数据分析的并行模式。
Philos Trans A Math Phys Eng Sci. 2020 Mar 6;378(2166):20190394. doi: 10.1098/rsta.2019.0394. Epub 2020 Jan 20.
5
Protein remote homology detection based on bidirectional long short-term memory.基于双向长短期记忆的蛋白质远程同源性检测
BMC Bioinformatics. 2017 Oct 10;18(1):443. doi: 10.1186/s12859-017-1842-2.
6
Protein Remote Homology Detection Based on an Ensemble Learning Approach.基于集成学习方法的蛋白质远程同源性检测
Biomed Res Int. 2016;2016:5813645. doi: 10.1155/2016/5813645. Epub 2016 May 8.
7
Protein remote homology detection by combining Chou's distance-pair pseudo amino acid composition and principal component analysis.结合周氏距离对伪氨基酸组成和主成分分析进行蛋白质远程同源性检测。
Mol Genet Genomics. 2015 Oct;290(5):1919-31. doi: 10.1007/s00438-015-1044-4. Epub 2015 Apr 21.
8
Protein fold recognition using geometric kernel data fusion.使用几何核数据融合的蛋白质折叠识别
Bioinformatics. 2014 Jul 1;30(13):1850-7. doi: 10.1093/bioinformatics/btu118. Epub 2014 Mar 3.
9
16S rRNA metagenome clustering and diversity estimation using locality sensitive hashing.使用局部敏感哈希进行16S rRNA宏基因组聚类和多样性估计。
BMC Syst Biol. 2013;7 Suppl 4(Suppl 4):S11. doi: 10.1186/1752-0509-7-S4-S11. Epub 2013 Oct 23.
10
Using distances between Top-n-gram and residue pairs for protein remote homology detection.使用 Top-n-gram 与残基对之间的距离进行蛋白质远程同源检测。
BMC Bioinformatics. 2014;15 Suppl 2(Suppl 2):S3. doi: 10.1186/1471-2105-15-S2-S3. Epub 2014 Jan 24.