Anton Bernat, Besalú Mireia, Fornes Oriol, Bonet Jaume, Molina Alexis, Molina-Fernandez Ruben, De Las Cuevas Gemma, Fernandez-Fuentes Narcis, Oliva Baldo
Structural Bioinformatics Lab (GRIB-IMIM), Department of Experimental and Health Science, University Pompeu Fabra, Barcelona 08005, Catalonia, Spain.
Departament de Genètica, Microbiologia i Estadística, Universitat de Barcelona, Barcelona 08028, Catalonia, Spain.
NAR Genom Bioinform. 2021 Apr 22;3(2):lqab027. doi: 10.1093/nargab/lqab027. eCollection 2021 Jun.
Direct-coupling analysis (DCA) for studying the coevolution of residues in proteins has been widely used to predict the three-dimensional structure of a protein from its sequence. We present RADI/raDIMod, a variation of the original DCA algorithm that groups chemically equivalent residues combined with super-secondary structure motifs to model protein structures. Interestingly, the simplification produced by grouping amino acids into only two groups (polar and non-polar) is still representative of the physicochemical nature that characterizes the protein structure and it is in line with the role of hydrophobic forces in protein-folding funneling. As a result of a compressed alphabet, the number of sequences required for the multiple sequence alignment is reduced. The number of long-range contacts predicted is limited; therefore, our approach requires the use of neighboring sequence-positions. We use the prediction of secondary structure and motifs of super-secondary structures to predict local contacts. We use RADI and raDIMod, a fragment-based protein structure modelling, achieving near native conformations when the number of super-secondary motifs covers >30-50% of the sequence. Interestingly, although different contacts are predicted with different alphabets, they produce similar structures.
用于研究蛋白质中残基协同进化的直接耦合分析(DCA)已被广泛用于从蛋白质序列预测其三维结构。我们提出了RADI/raDIMod,它是原始DCA算法的一种变体,该算法将化学等价残基与超二级结构基序分组以对蛋白质结构进行建模。有趣的是,将氨基酸仅分为两组(极性和非极性)所产生的简化仍然代表了表征蛋白质结构的物理化学性质,并且这与疏水作用力在蛋白质折叠漏斗中的作用一致。由于字母表的压缩,多序列比对所需的序列数量减少。预测的长程接触数量有限;因此,我们的方法需要使用相邻的序列位置。我们使用二级结构和超二级结构基序的预测来预测局部接触。我们使用RADI和raDIMod,一种基于片段的蛋白质结构建模方法,当超二级基序的数量覆盖序列的>30 - 50%时,可实现接近天然的构象。有趣的是,尽管使用不同的字母表预测出不同的接触,但它们产生相似的结构。