• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

基于冗余模式的蛋白质序列分类。

Classification of protein sequences by means of irredundant patterns.

机构信息

Department of Information Engineering, University of Padova, Italy.

出版信息

BMC Bioinformatics. 2010 Jan 18;11 Suppl 1(Suppl 1):S16. doi: 10.1186/1471-2105-11-S1-S16.

DOI:10.1186/1471-2105-11-S1-S16
PMID:20122187
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC3009487/
Abstract

BACKGROUND

The classification of protein sequences using string algorithms provides valuable insights for protein function prediction. Several methods, based on a variety of different patterns, have been previously proposed. Almost all string-based approaches discover patterns that are not "independent, " and therefore the associated scores overcount, a multiple number of times, the contribution of patterns that cover the same region of a sequence.

RESULTS

In this paper we use a class of patterns, called irredundant, that is specifically designed to address this issue. Loosely speaking the set of irredundant patterns is the smallest class of "independent" patterns that can describe all common patterns in two sequences, thus they avoid overcounting. We present a novel discriminative method, called Irredundant Class, based on the statistics of irredundant patterns combined with the power of support vector machines.

CONCLUSION

Tests on benchmark data show that Irredundant Class outperforms most of the string algorithms previously proposed, and it achieves results as good as current state-of-the-art methods. Moreover the footprints of the most discriminative irredundant patterns can be used to guide the identification of functional regions in protein sequences.

摘要

背景

使用字符串算法对蛋白质序列进行分类为蛋白质功能预测提供了有价值的见解。以前已经提出了几种基于各种不同模式的方法。几乎所有基于字符串的方法都发现了不是“独立”的模式,因此相关的得分多次重复计算了覆盖序列同一区域的模式的贡献。

结果

在本文中,我们使用了一类称为非冗余的模式,专门用于解决这个问题。从广义上讲,非冗余模式集是可以描述两个序列中所有常见模式的“独立”模式的最小集合,因此它们避免了重复计数。我们提出了一种新的有区别的方法,称为非冗余类,它基于非冗余模式的统计数据和支持向量机的威力。

结论

在基准数据上的测试表明,非冗余类优于以前提出的大多数字符串算法,并且它的结果与当前最先进的方法一样好。此外,最具区分性的非冗余模式的足迹可用于指导蛋白质序列中功能区域的识别。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/95c1/3009487/2417be05193e/1471-2105-11-S1-S16-4.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/95c1/3009487/0050c9bd1726/1471-2105-11-S1-S16-1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/95c1/3009487/c5b5e6033685/1471-2105-11-S1-S16-2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/95c1/3009487/a49991c67237/1471-2105-11-S1-S16-3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/95c1/3009487/2417be05193e/1471-2105-11-S1-S16-4.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/95c1/3009487/0050c9bd1726/1471-2105-11-S1-S16-1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/95c1/3009487/c5b5e6033685/1471-2105-11-S1-S16-2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/95c1/3009487/a49991c67237/1471-2105-11-S1-S16-3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/95c1/3009487/2417be05193e/1471-2105-11-S1-S16-4.jpg

相似文献

1
Classification of protein sequences by means of irredundant patterns.基于冗余模式的蛋白质序列分类。
BMC Bioinformatics. 2010 Jan 18;11 Suppl 1(Suppl 1):S16. doi: 10.1186/1471-2105-11-S1-S16.
2
The irredundant class method for remote homology detection of protein sequences.用于蛋白质序列远程同源性检测的非冗余类方法。
J Comput Biol. 2011 Dec;18(12):1819-29. doi: 10.1089/cmb.2010.0171. Epub 2011 May 6.
3
SVM-Fold: a tool for discriminative multi-class protein fold and superfamily recognition.支持向量机折叠法:一种用于判别式多类别蛋白质折叠和超家族识别的工具。
BMC Bioinformatics. 2007 May 22;8 Suppl 4(Suppl 4):S2. doi: 10.1186/1471-2105-8-S4-S2.
4
Mismatch string kernels for discriminative protein classification.用于判别式蛋白质分类的错配字符串核
Bioinformatics. 2004 Mar 1;20(4):467-76. doi: 10.1093/bioinformatics/btg431. Epub 2004 Jan 22.
5
Protein homology detection using string alignment kernels.使用字符串比对核进行蛋白质同源性检测。
Bioinformatics. 2004 Jul 22;20(11):1682-9. doi: 10.1093/bioinformatics/bth141. Epub 2004 Feb 26.
6
Application of string kernels in protein sequence classification.字符串核在蛋白质序列分类中的应用。
Appl Bioinformatics. 2005;4(1):45-52. doi: 10.2165/00822942-200504010-00005.
7
Efficient use of unlabeled data for protein sequence classification: a comparative study.蛋白质序列分类中未标记数据的高效利用:一项比较研究。
BMC Bioinformatics. 2009 Apr 29;10 Suppl 4(Suppl 4):S2. doi: 10.1186/1471-2105-10-S4-S2.
8
Word correlation matrices for protein sequence analysis and remote homology detection.用于蛋白质序列分析和远程同源性检测的词相关矩阵。
BMC Bioinformatics. 2008 Jun 3;9:259. doi: 10.1186/1471-2105-9-259.
9
Profile-based string kernels for remote homology detection and motif extraction.基于轮廓的字符串核用于远程同源性检测和基序提取。
J Bioinform Comput Biol. 2005 Jun;3(3):527-50. doi: 10.1142/s021972000500120x.
10
Biological sequence classification with multivariate string kernels.
IEEE/ACM Trans Comput Biol Bioinform. 2013 Sep-Oct;10(5):1201-10. doi: 10.1109/TCBB.2013.15.

引用本文的文献

1
Clustering of reads with alignment-free measures and quality values.使用无比对方法和质量值对 reads 进行聚类。
Algorithms Mol Biol. 2015 Jan 28;10:4. doi: 10.1186/s13015-014-0029-x. eCollection 2015.
2
Assembly-free genome comparison based on next-generation sequencing reads and variable length patterns.基于下一代测序读段和可变长度模式的无组装基因组比较。
BMC Bioinformatics. 2014;15 Suppl 9(Suppl 9):S1. doi: 10.1186/1471-2105-15-S9-S1. Epub 2014 Sep 10.
3
Alignment-free phylogeny of whole genomes using underlying subwords.

本文引用的文献

1
Word correlation matrices for protein sequence analysis and remote homology detection.用于蛋白质序列分析和远程同源性检测的词相关矩阵。
BMC Bioinformatics. 2008 Jun 3;9:259. doi: 10.1186/1471-2105-9-259.
2
The 20 years of PROSITE.PROSITE的二十年。
Nucleic Acids Res. 2008 Jan;36(Database issue):D245-9. doi: 10.1093/nar/gkm977. Epub 2007 Nov 14.
3
Bases of motifs for generating repeated patterns with wild cards.用于生成带通配符重复模式的基序基础。
利用潜在子词进行全基因组的无比对系统发育分析。
Algorithms Mol Biol. 2012 Dec 6;7(1):34. doi: 10.1186/1748-7188-7-34.
IEEE/ACM Trans Comput Biol Bioinform. 2005 Jan-Mar;2(1):40-50. doi: 10.1109/TCBB.2005.5.
4
Profile-based direct kernels for remote homology detection and fold recognition.用于远程同源性检测和折叠识别的基于轮廓的直接内核。
Bioinformatics. 2005 Dec 1;21(23):4239-47. doi: 10.1093/bioinformatics/bti687. Epub 2005 Sep 27.
5
Profile-based string kernels for remote homology detection and motif extraction.基于轮廓的字符串核用于远程同源性检测和基序提取。
J Bioinform Comput Biol. 2005 Jun;3(3):527-50. doi: 10.1142/s021972000500120x.
6
Incremental paradigms of motif discovery.
J Comput Biol. 2004;11(1):15-25. doi: 10.1089/106652704773416867.
7
Mismatch string kernels for discriminative protein classification.用于判别式蛋白质分类的错配字符串核
Bioinformatics. 2004 Mar 1;20(4):467-76. doi: 10.1093/bioinformatics/btg431. Epub 2004 Jan 22.
8
Protein homology detection using string alignment kernels.使用字符串比对核进行蛋白质同源性检测。
Bioinformatics. 2004 Jul 22;20(11):1682-9. doi: 10.1093/bioinformatics/bth141. Epub 2004 Feb 26.
9
Combining pairwise sequence similarity and support vector machines for detecting remote protein evolutionary and structural relationships.结合成对序列相似性和支持向量机来检测远距离蛋白质进化和结构关系。
J Comput Biol. 2003;10(6):857-68. doi: 10.1089/106652703322756113.
10
Efficient remote homology detection using local structure.利用局部结构进行高效的远程同源性检测。
Bioinformatics. 2003 Nov 22;19(17):2294-301. doi: 10.1093/bioinformatics/btg317.