Suppr超能文献

蛋白质的度量空间——聚类算法的比较研究

The metric space of proteins-comparative study of clustering algorithms.

作者信息

Sasson Ori, Linial Nathan, Linial Michal

机构信息

School of Computer Science and Engineering Department of Biological Chemistry, Institute of Life Sciences, Hebrew University, Jerusalem 91904, Israel.

出版信息

Bioinformatics. 2002;18 Suppl 1:S14-21. doi: 10.1093/bioinformatics/18.suppl_1.s14.

Abstract

MOTIVATION

A large fraction of biological research concentrates on individual proteins and on small families of proteins. One of the current major challenges in bioinformatics is to extend our knowledge to very large sets of proteins. Several major projects have tackled this problem. Such undertakings usually start with a process that clusters all known proteins or large subsets of this space. Some work in this area is carried out automatically, while other attempts incorporate expert advice and annotation.

RESULTS

We propose a novel technique that automatically clusters protein sequences. We consider all proteins in SWISSPROT, and carry out an all-against-all BLAST similarity test among them. With this similarity measure in hand we proceed to perform a continuous bottom-up clustering process by applying alternative rules for merging clusters. The outcome of this clustering process is a classification of the input proteins into a hierarchy of clusters of varying degrees of granularity. Here we compare the clusters that result from alternative merging rules, and validate the results against InterPro. Our preliminary results show that clusters that are consistent with several rather than a single merging rule tend to comply with InterPro annotation. This is an affirmation of the view that the protein space consists of families that differ markedly in their evolutionary conservation.

摘要

动机

大部分生物学研究集中于单个蛋白质和小的蛋白质家族。生物信息学当前的主要挑战之一是将我们的知识扩展到非常大的蛋白质集合。几个重大项目已经着手解决这个问题。此类工作通常从对所有已知蛋白质或该空间的大子集进行聚类的过程开始。该领域的一些工作是自动进行的,而其他尝试则纳入了专家建议和注释。

结果

我们提出了一种自动对蛋白质序列进行聚类的新技术。我们考虑了SWISSPROT中的所有蛋白质,并在它们之间进行了全对全的BLAST相似性测试。有了这种相似性度量后,我们通过应用合并聚类的替代规则来进行连续的自底向上聚类过程。该聚类过程的结果是将输入蛋白质分类为具有不同粒度的聚类层次结构。在这里,我们比较了由替代合并规则产生的聚类,并根据InterPro对结果进行验证。我们的初步结果表明,与多个而非单个合并规则一致的聚类往往符合InterPro注释。这证实了蛋白质空间由在进化保守性上有显著差异的家族组成这一观点。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验