School of Electrical and Computer Engineering, National Technical University of Athens, 9 Iroon Polytechniou Str., Zografos, 15780 Athens, Greece.
J Biomed Inform. 2010 Apr;43(2):257-67. doi: 10.1016/j.jbi.2010.01.005. Epub 2010 Jan 25.
A set of proteins is a complex system whose elements are interrelated on the concept of sequence- and structure-based similarity. Here, we applied a similarity network-based methodology for the representation and analysis of protein sequences and structures sets using a non-redundant set of 311 proteins and three different information criteria based on sequence-derived features, sequence local alignment and structural alignment. A wide set of measurements, like network degree, clustering coefficient, characteristic path length and vertex centrality were utilized to characterize the networks' topology. Protein similarity networks were found medium or highly interconnected and the existence of both clusters and random edges classified their fully connected versions as Small World Networks (SWNs). The SWN architecture was able to host the continuous similarity transition among proteins and model the protein information flow during evolution. Recently reported ancestral elements, like the alpha/beta class and certain folds, were remarkably found to act as hubs in the networks. Additionally, the moderate information value of sequence-derived features when used for fold and class assignment was shown on a network basis. The methodology described here can be applied for the analysis of other complex systems which consist of interrelated elements and a certain information flow.
一组蛋白质是一个复杂的系统,其元素在序列和结构相似性的概念上是相互关联的。在这里,我们应用了一种基于相似性网络的方法,使用一组非冗余的 311 种蛋白质和三种基于序列衍生特征、序列局部比对和结构比对的不同信息标准,来表示和分析蛋白质序列和结构集。广泛的测量方法,如网络度、聚类系数、特征路径长度和顶点中心性,用于描述网络的拓扑结构。蛋白质相似性网络被发现具有中等或高度的连通性,并且存在簇和随机边缘,将其全连通版本分类为小世界网络 (SWN)。SWN 架构能够在蛋白质之间容纳连续的相似性转换,并模拟蛋白质在进化过程中的信息流。最近报道的祖元素,如 alpha/beta 类和某些折叠,在网络中被显著地发现作为枢纽。此外,在网络基础上,还展示了序列衍生特征的中等信息值在折叠和类分配中的作用。这里描述的方法可以应用于分析由相互关联的元素和一定信息流组成的其他复杂系统。