• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

利用局部结构进行高效的远程同源性检测。

Efficient remote homology detection using local structure.

作者信息

Hou Yuna, Hsu Wynne, Lee Mong Li, Bystroff Christopher

机构信息

School of Computing, National University of Singapore, Singapore 117543.

出版信息

Bioinformatics. 2003 Nov 22;19(17):2294-301. doi: 10.1093/bioinformatics/btg317.

DOI:10.1093/bioinformatics/btg317
PMID:14630658
Abstract

MOTIVATION

The function of an unknown biological sequence can often be accurately inferred if we are able to map this unknown sequence to its corresponding homologous family. At present, discriminative methods such as SVM-Fisher and SVM-pairwise, which combine support vector machine (SVM) and sequence similarity, are recognized as the most accurate methods, with SVM-pairwise being the most accurate. However, these methods typically encode sequence information into their feature vectors and ignore the structure information. They are also computationally inefficient. Based on these observations, we present an alternative method for SVM-based protein classification. Our proposed method, SVM-I-sites, utilizes structure similarity for remote homology detection.

RESULT

We run experiments on the Structural Classification of Proteins 1.53 data set. The results show that SVM-I-sites is more efficient than SVM-pairwise. Further, we find that SVM-I-sites outperforms sequence-based methods such as PSI-BLAST, SAM, and SVM-Fisher while achieving a comparable performance with SVM-pairwise.

AVAILABILITY

I-sites server is accessible through the web at http://www.bioinfo.rpi.edu. Programs are available upon request for academics. Licensing agreements are available for commercial interests. The framework of encoding local structure into feature vector is available upon request.

摘要

动机

如果我们能够将未知生物序列映射到其相应的同源家族,通常可以准确推断出该未知序列的功能。目前,诸如SVM-Fisher和SVM-成对法等结合支持向量机(SVM)和序列相似性的判别方法被认为是最准确的方法,其中SVM-成对法最为准确。然而,这些方法通常将序列信息编码到其特征向量中,而忽略了结构信息。它们在计算上也效率低下。基于这些观察结果,我们提出了一种基于SVM的蛋白质分类的替代方法。我们提出的方法SVM-I-sites利用结构相似性进行远程同源性检测。

结果

我们在蛋白质结构分类1.53数据集上进行了实验。结果表明,SVM-I-sites比SVM-成对法更高效。此外,我们发现SVM-I-sites优于基于序列的方法,如PSI-BLAST、SAM和SVM-Fisher,同时与SVM-成对法具有相当的性能。

可用性

I-sites服务器可通过网络访问,网址为http://www.bioinfo.rpi.edu。程序可应学者要求提供。商业利益方可获得许可协议。将局部结构编码到特征向量中的框架可应要求提供。

相似文献

1
Efficient remote homology detection using local structure.利用局部结构进行高效的远程同源性检测。
Bioinformatics. 2003 Nov 22;19(17):2294-301. doi: 10.1093/bioinformatics/btg317.
2
SVM-Fold: a tool for discriminative multi-class protein fold and superfamily recognition.支持向量机折叠法:一种用于判别式多类别蛋白质折叠和超家族识别的工具。
BMC Bioinformatics. 2007 May 22;8 Suppl 4(Suppl 4):S2. doi: 10.1186/1471-2105-8-S4-S2.
3
Application of latent semantic analysis to protein remote homology detection.潜在语义分析在蛋白质远程同源性检测中的应用。
Bioinformatics. 2006 Feb 1;22(3):285-90. doi: 10.1093/bioinformatics/bti801. Epub 2005 Nov 29.
4
SVM-HUSTLE--an iterative semi-supervised machine learning approach for pairwise protein remote homology detection.SVM-HUSTLE——一种用于成对蛋白质远程同源性检测的迭代半监督机器学习方法。
Bioinformatics. 2008 Mar 15;24(6):783-90. doi: 10.1093/bioinformatics/btn028. Epub 2008 Feb 1.
5
Profile-based string kernels for remote homology detection and motif extraction.基于轮廓的字符串核用于远程同源性检测和基序提取。
J Bioinform Comput Biol. 2005 Jun;3(3):527-50. doi: 10.1142/s021972000500120x.
6
Protein homology detection using string alignment kernels.使用字符串比对核进行蛋白质同源性检测。
Bioinformatics. 2004 Jul 22;20(11):1682-9. doi: 10.1093/bioinformatics/bth141. Epub 2004 Feb 26.
7
Remote protein homology detection and fold recognition using two-layer support vector machine classifiers.使用两层支持向量机分类器进行远程蛋白质同源检测和折叠识别。
Comput Biol Med. 2011 Aug;41(8):687-99. doi: 10.1016/j.compbiomed.2011.06.004. Epub 2011 Jun 25.
8
Support vector machines with profile-based kernels for remote protein homology detection.用于远程蛋白质同源性检测的基于轮廓核的支持向量机。
Genome Inform. 2004;15(2):191-200.
9
Remote homology detection: a motif based approach.远程同源性检测:一种基于基序的方法。
Bioinformatics. 2003;19 Suppl 1:i26-33. doi: 10.1093/bioinformatics/btg1002.
10
PairProSVM: protein subcellular localization based on local pairwise profile alignment and SVM.PairProSVM:基于局部两两轮廓比对和支持向量机的蛋白质亚细胞定位
IEEE/ACM Trans Comput Biol Bioinform. 2008 Jul-Sep;5(3):416-22. doi: 10.1109/TCBB.2007.70256.

引用本文的文献

1
Using machine learning tools for protein database biocuration assistance.利用机器学习工具辅助蛋白质数据库生物注释。
Sci Rep. 2018 Jul 5;8(1):10148. doi: 10.1038/s41598-018-28330-z.
2
Label noise in subtype discrimination of class C G protein-coupled receptors: A systematic approach to the analysis of classification errors.C类G蛋白偶联受体亚型识别中的标签噪声:一种分析分类错误的系统方法。
BMC Bioinformatics. 2015 Sep 29;16:314. doi: 10.1186/s12859-015-0731-9.
3
Using distances between Top-n-gram and residue pairs for protein remote homology detection.
使用 Top-n-gram 与残基对之间的距离进行蛋白质远程同源检测。
BMC Bioinformatics. 2014;15 Suppl 2(Suppl 2):S3. doi: 10.1186/1471-2105-15-S2-S3. Epub 2014 Jan 24.
4
Using amino acid physicochemical distance transformation for fast protein remote homology detection.利用氨基酸物化距离变换进行快速蛋白质远程同源检测。
PLoS One. 2012;7(9):e46633. doi: 10.1371/journal.pone.0046633. Epub 2012 Sep 28.
5
A discriminative method for family-based protein remote homology detection that combines inductive logic programming and propositional models.基于归纳逻辑编程和命题模型的家族蛋白质远程同源检测的判别方法。
BMC Bioinformatics. 2011 Mar 23;12:83. doi: 10.1186/1471-2105-12-83.
6
Incorporation of local structural preference potential improves fold recognition.局部结构偏好势的纳入提高了折叠识别的性能。
PLoS One. 2011 Feb 18;6(2):e17215. doi: 10.1371/journal.pone.0017215.
7
Physicochemical property distributions for accurate and rapid pairwise protein homology detection.用于准确快速进行蛋白质两两同源性检测的理化性质分布。
BMC Bioinformatics. 2010 Mar 19;11:145. doi: 10.1186/1471-2105-11-145.
8
Classification of protein sequences by means of irredundant patterns.基于冗余模式的蛋白质序列分类。
BMC Bioinformatics. 2010 Jan 18;11 Suppl 1(Suppl 1):S16. doi: 10.1186/1471-2105-11-S1-S16.
9
A discriminative method for protein remote homology detection and fold recognition combining Top-n-grams and latent semantic analysis.一种结合Top-n-grams和潜在语义分析的蛋白质远程同源性检测与折叠识别的判别方法。
BMC Bioinformatics. 2008 Dec 1;9:510. doi: 10.1186/1471-2105-9-510.
10
Application of nonnegative matrix factorization to improve profile-profile alignment features for fold recognition and remote homolog detection.非负矩阵分解在改善用于折叠识别和远程同源物检测的轮廓-轮廓比对特征方面的应用。
BMC Bioinformatics. 2008 Jul 1;9:298. doi: 10.1186/1471-2105-9-298.