• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

蛋白质的度量空间——聚类算法的比较研究

The metric space of proteins-comparative study of clustering algorithms.

作者信息

Sasson Ori, Linial Nathan, Linial Michal

机构信息

School of Computer Science and Engineering Department of Biological Chemistry, Institute of Life Sciences, Hebrew University, Jerusalem 91904, Israel.

出版信息

Bioinformatics. 2002;18 Suppl 1:S14-21. doi: 10.1093/bioinformatics/18.suppl_1.s14.

DOI:10.1093/bioinformatics/18.suppl_1.s14
PMID:12169526
Abstract

MOTIVATION

A large fraction of biological research concentrates on individual proteins and on small families of proteins. One of the current major challenges in bioinformatics is to extend our knowledge to very large sets of proteins. Several major projects have tackled this problem. Such undertakings usually start with a process that clusters all known proteins or large subsets of this space. Some work in this area is carried out automatically, while other attempts incorporate expert advice and annotation.

RESULTS

We propose a novel technique that automatically clusters protein sequences. We consider all proteins in SWISSPROT, and carry out an all-against-all BLAST similarity test among them. With this similarity measure in hand we proceed to perform a continuous bottom-up clustering process by applying alternative rules for merging clusters. The outcome of this clustering process is a classification of the input proteins into a hierarchy of clusters of varying degrees of granularity. Here we compare the clusters that result from alternative merging rules, and validate the results against InterPro. Our preliminary results show that clusters that are consistent with several rather than a single merging rule tend to comply with InterPro annotation. This is an affirmation of the view that the protein space consists of families that differ markedly in their evolutionary conservation.

摘要

动机

大部分生物学研究集中于单个蛋白质和小的蛋白质家族。生物信息学当前的主要挑战之一是将我们的知识扩展到非常大的蛋白质集合。几个重大项目已经着手解决这个问题。此类工作通常从对所有已知蛋白质或该空间的大子集进行聚类的过程开始。该领域的一些工作是自动进行的,而其他尝试则纳入了专家建议和注释。

结果

我们提出了一种自动对蛋白质序列进行聚类的新技术。我们考虑了SWISSPROT中的所有蛋白质,并在它们之间进行了全对全的BLAST相似性测试。有了这种相似性度量后,我们通过应用合并聚类的替代规则来进行连续的自底向上聚类过程。该聚类过程的结果是将输入蛋白质分类为具有不同粒度的聚类层次结构。在这里,我们比较了由替代合并规则产生的聚类,并根据InterPro对结果进行验证。我们的初步结果表明,与多个而非单个合并规则一致的聚类往往符合InterPro注释。这证实了蛋白质空间由在进化保守性上有显著差异的家族组成这一观点。

相似文献

1
The metric space of proteins-comparative study of clustering algorithms.蛋白质的度量空间——聚类算法的比较研究
Bioinformatics. 2002;18 Suppl 1:S14-21. doi: 10.1093/bioinformatics/18.suppl_1.s14.
2
Cross-over between discrete and continuous protein structure space: insights into automatic classification and networks of protein structures.离散与连续蛋白质结构空间之间的交叉:对蛋白质结构自动分类及网络的见解。
PLoS Comput Biol. 2009 Mar;5(3):e1000331. doi: 10.1371/journal.pcbi.1000331. Epub 2009 Mar 27.
3
ProtoNet: hierarchical classification of the protein space.ProtoNet:蛋白质空间的层次分类
Nucleic Acids Res. 2003 Jan 1;31(1):348-52. doi: 10.1093/nar/gkg096.
4
Towards automatic clustering of protein sequences.迈向蛋白质序列的自动聚类
Proc IEEE Comput Soc Bioinform Conf. 2002;1:175-86.
5
Euclidian space and grouping of biological objects.欧几里得空间与生物对象的分组
Bioinformatics. 2002 Nov;18(11):1523-34. doi: 10.1093/bioinformatics/18.11.1523.
6
On the quality of tree-based protein classification.论基于树的蛋白质分类的质量。
Bioinformatics. 2005 May 1;21(9):1876-90. doi: 10.1093/bioinformatics/bti244. Epub 2005 Jan 12.
7
ProClust: improved clustering of protein sequences with an extended graph-based approach.ProClust:基于扩展的图形方法改进蛋白质序列聚类
Bioinformatics. 2002;18 Suppl 2:S182-91. doi: 10.1093/bioinformatics/18.suppl_2.s182.
8
Efficient algorithms for accurate hierarchical clustering of huge datasets: tackling the entire protein space.用于对海量数据集进行精确层次聚类的高效算法:攻克整个蛋白质空间
Bioinformatics. 2008 Jul 1;24(13):i41-9. doi: 10.1093/bioinformatics/btn174.
9
Graph-based clustering for finding distant relationships in a large set of protein sequences.基于图形的聚类方法,用于在大量蛋白质序列中寻找远亲关系。
Bioinformatics. 2004 Jan 22;20(2):243-52. doi: 10.1093/bioinformatics/btg397.
10
Incremental generation of summarized clustering hierarchy for protein family analysis.用于蛋白质家族分析的汇总聚类层次结构的增量生成。
Bioinformatics. 2004 Nov 1;20(16):2586-96. doi: 10.1093/bioinformatics/bth290. Epub 2004 May 6.

引用本文的文献

1
Geometric aspects of biological sequence comparison.生物序列比较的几何方面。
J Comput Biol. 2009 Apr;16(4):579-610. doi: 10.1089/cmb.2008.0100.
2
Partitioning clustering algorithms for protein sequence data sets.蛋白质序列数据集的分区聚类算法。
BioData Min. 2009 Apr 2;2(1):3. doi: 10.1186/1756-0381-2-3.
3
Probing metagenomics by rapid cluster analysis of very large datasets.通过对超大型数据集进行快速聚类分析来探索宏基因组学。
PLoS One. 2008;3(10):e3375. doi: 10.1371/journal.pone.0003375. Epub 2008 Oct 10.
4
EVEREST: automatic identification and classification of protein domains in all protein sequences.EVEREST:对所有蛋白质序列中的蛋白质结构域进行自动识别和分类。
BMC Bioinformatics. 2006 Jun 2;7:277. doi: 10.1186/1471-2105-7-277.
5
Spectral clustering of protein sequences.蛋白质序列的谱聚类
Nucleic Acids Res. 2006 Mar 17;34(5):1571-80. doi: 10.1093/nar/gkj515. Print 2006.
6
Hierarchical clustering algorithm for comprehensive orthologous-domain classification in multiple genomes.用于多基因组中综合直系同源域分类的层次聚类算法。
Nucleic Acids Res. 2006 Jan 25;34(2):647-58. doi: 10.1093/nar/gkj448. Print 2006.
7
Clustering protein sequences with a novel metric transformed from sequence similarity scores and sequence alignments with neural networks.使用从序列相似性得分转换而来的新度量以及神经网络进行的序列比对来对蛋白质序列进行聚类。
BMC Bioinformatics. 2005 Oct 3;6:242. doi: 10.1186/1471-2105-6-242.
8
ProTarget: automatic prediction of protein structure novelty.ProTarget:蛋白质结构新颖性的自动预测
Nucleic Acids Res. 2005 Jul 1;33(Web Server issue):W81-4. doi: 10.1093/nar/gki389.
9
A functional hierarchical organization of the protein sequence space.蛋白质序列空间的功能层次组织。
BMC Bioinformatics. 2004 Dec 14;5:196. doi: 10.1186/1471-2105-5-196.
10
ProtoNet: hierarchical classification of the protein space.ProtoNet:蛋白质空间的层次分类
Nucleic Acids Res. 2003 Jan 1;31(1):348-52. doi: 10.1093/nar/gkg096.