Freudenberg Jan, Zimmer Ralf, Hanisch Daniel, Lengauer Thomas
GMD-Forschungsinstitut Informationstechnik, Schloss Birlinghoven, St. Augustin, Germany.
In Silico Biol. 2002;2(3):339-49.
Classification of proteins is a major challenge in bioinformatics. Here an approach is presented, that unifies different existing classifications of protein structures and sequences. Protein structural domains are represented as nodes in a hypergraph. Shared memberships in sequence families result in hyperedges in the graph. The presented method partitions the hypergraph into clusters of structural domains. Each computed cluster is based on a set of shared sequence family memberships. Thus, the clusters put existing protein sequence families into the context of structural family hierarchies. Conversely, structural domains are related to their sequence family memberships, which can be used to gain further knowledge about the respective structural families.
蛋白质分类是生物信息学中的一项重大挑战。本文提出了一种方法,该方法统一了现有的不同蛋白质结构和序列分类。蛋白质结构域在超图中表示为节点。序列家族中的共享成员关系导致图中的超边。所提出的方法将超图划分为结构域簇。每个计算出的簇基于一组共享的序列家族成员关系。因此,这些簇将现有的蛋白质序列家族置于结构家族层次结构的背景下。相反,结构域与它们的序列家族成员关系相关,这可用于获取关于各个结构家族的更多知识。