Bastolla Ugo, Ortíz Angel R, Porto Markus, Teichert Florian
Centro de Biología Molecular Severo Ochoa, (CSIC-UAM), Cantoblanco, 28049 Madrid, Spain.
Proteins. 2008 Dec;73(4):872-88. doi: 10.1002/prot.22113.
The complexity of protein structures calls for simplified representations of their topology. The simplest possible mathematical description of a protein structure is a one-dimensional profile representing, for instance, buriedness or secondary structure. This kind of representation has been introduced for studying the sequence to structure relationship, with applications to fold recognition. Here we define the effective connectivity profile (EC), a network theoretical profile that self-consistently represents the network structure of the protein contact matrix. The EC profile makes mathematically explicit the relationship between protein structure and protein sequence, because it allows predicting the average hydrophobicity profile (HP) and the distributions of amino acids at each site for families of homologous proteins sharing the same structure. In this sense, the EC provides an analytic solution to the statistical inverse folding problem, which consists in finding the statistical properties of the set of sequences compatible with a given structure. We tested these predictions with simulations of the structurally constrained neutral (SCN) model of protein evolution with structure conservation, for single- and multi-domain proteins, and for a wide range of mutation processes, the latter producing sequences with very different hydrophobicity profiles, finding that the EC-based predictions are accurate even when only one sequence of the family is known. The EC profile is very significantly correlated with the HP for sequence-structure pairs in the PDB as well. The EC profile generalizes the properties of previously introduced structural profiles to modular proteins such as multidomain chains, and its correlation with the sequence profile is substantially improved with respect to the previously defined profiles, particularly for long proteins. Furthermore, the EC profile has a dynamic interpretation, since the EC components are strongly inversely related with the temperature factors measured in X-ray experiments, meaning that positions with large EC component are more strongly constrained in their equilibrium dynamics. Last, the EC profile allows to define a natural measure of modularity that correlates with the number of domains composing the protein, suggesting its application for domain decomposition. Finally, we show that structurally similar proteins have similar EC profiles, so that the similarity between aligned EC profiles can be used as a structure similarity measure, a property that we have recently applied for protein structure alignment. The code for computing the EC profile is available upon request writing to ubastolla@cbm.uam.es, and the structural profiles discussed in this article can be downloaded from the SLOTH webserver http://www.fkp.tu-darmstadt.de/SLOTH/.
蛋白质结构的复杂性需要对其拓扑结构进行简化表示。对蛋白质结构最简单的数学描述是一维轮廓,例如表示埋藏度或二级结构。这种表示方式已被引入用于研究序列与结构的关系,并应用于折叠识别。在这里,我们定义了有效连通性轮廓(EC),这是一种网络理论轮廓,它自洽地表示蛋白质接触矩阵的网络结构。EC轮廓从数学上明确了蛋白质结构与蛋白质序列之间的关系,因为它可以预测具有相同结构的同源蛋白质家族的平均疏水性轮廓(HP)以及每个位点氨基酸的分布。从这个意义上说,EC为统计逆折叠问题提供了一种解析解决方案,该问题在于找到与给定结构兼容的序列集的统计特性。我们使用具有结构保守性的蛋白质进化的结构受限中性(SCN)模型模拟,对单域和多域蛋白质以及广泛的突变过程进行了测试,后者产生具有非常不同疏水性轮廓的序列,发现即使仅知道家族中的一个序列,基于EC的预测也是准确的。EC轮廓与PDB中序列 - 结构对的HP也有非常显著的相关性。EC轮廓将先前引入的结构轮廓的属性推广到模块化蛋白质,如多域链,并且相对于先前定义的轮廓,其与序列轮廓的相关性有了实质性的提高,特别是对于长蛋白质。此外,EC轮廓具有动态解释,因为EC成分与X射线实验中测量的温度因子强烈负相关,这意味着具有大EC成分的位置在其平衡动力学中受到更强的限制。最后,EC轮廓允许定义一种与组成蛋白质的结构域数量相关的自然模块化度量,表明其可用于结构域分解。最后,我们表明结构相似的蛋白质具有相似的EC轮廓,因此对齐的EC轮廓之间的相似性可以用作结构相似性度量,我们最近已将此属性应用于蛋白质结构对齐。计算EC轮廓的代码可通过写信至ubastolla@cbm.uam.es索取,本文讨论的结构轮廓可从SLOTH网络服务器http://www.fkp.tu-darmstadt.de/SLOTH/下载。