Hennetin Jérôme, Le Tuan Khanh, Canard Luc, Colloc'h Nathalie, Mornon Jean-Paul, Callebaut Isabelle
Systèmes moléculaires and Biologie structurale, LMCP, CNRS UMR 7590, Universités Paris 6 & Paris 7, case 115, 4 place Jussieu, 75252 Paris Cedex 05, France.
Proteins. 2003 May 1;51(2):236-44. doi: 10.1002/prot.10355.
Patterns of hydrophobic and hydrophilic residues (binary patterns) play an important role in protein architecture and can be roughly categorized into two classes regarding their preferential participation in alpha-helices or beta-strands. However, a single binary pattern can be embedded into different longer patterns carrying opposite structural information and thus cannot be as much informative as expected. Here, we consider conditional binary patterns, or hydrophobic clusters, whose existence is conditioned by the presence of a minimum number of nonhydrophobic residues, called the connectivity distance, that separate two hydrophobic amino acids assumed to belong to two distinct patterns. Conditional binary patterns are distinct from simple ones in that they are not intertwined, i.e., they can not include or be included in other conditional patterns and therefore carry a much more differentiated information, in particular being dramatically better correlated with regular secondary structures (especially beta ones). The distribution of these nonintertwined binary patterns in natural proteins was assessed relative to randomness, evidencing the structural bricks that are favored and disfavored by evolutionary selection. Several connectivity distances as well as several hydrophobic alphabets were tested, evidencing the clear superiority of a connectivity distance of 4, which mimics the minimum current length of loops in globular domains, and of the VILFMYW alphabet, selected from structural data (secondary structure propension and Voronoï tesselation), in highlighting fundamental properties of protein folds.
疏水和亲水残基模式(二元模式)在蛋白质结构中起着重要作用,根据它们在α螺旋或β链中的优先参与情况可大致分为两类。然而,单个二元模式可能会嵌入到携带相反结构信息的不同更长模式中,因此其信息量可能不如预期。在这里,我们考虑条件二元模式或疏水簇,其存在取决于一定数量的非疏水残基(称为连接距离)的存在,这些非疏水残基将假定属于两个不同模式的两个疏水氨基酸分隔开。条件二元模式与简单模式不同,因为它们不会相互交织,即它们不能包含或被其他条件模式包含,因此携带的信息更具差异性,特别是与规则二级结构(尤其是β结构)的相关性显著更好。相对于随机性评估了这些非交织二元模式在天然蛋白质中的分布,揭示了进化选择所青睐和不青睐的结构单元。测试了几种连接距离以及几种疏水字母表,结果表明连接距离为4(模拟球状结构域中当前环的最小长度)和从结构数据(二级结构倾向和沃罗诺伊镶嵌)中选择的VILFMYW字母表在突出蛋白质折叠的基本特性方面具有明显优势。