Protein Structure Prediction, IMPMC, CNRS UMR 7590, Paris 6 University, 75015 Paris, France.
Biochimie. 2009 Nov-Dec;91(11-12):1465-74. doi: 10.1016/j.biochi.2009.07.016. Epub 2009 Aug 6.
We propose an algorithm that allows predicting residues important for the formation of the structure of globular proteins. It relies on a simulation that detects the amino acids presenting a maximum number of neighbours during the early steps of the folding process. They have been called MIR (Most Interacting Residues). Independently, description of the protein structures in fragments with closed ends shows the correlation between these extremities and the core of the globules. These fragments are of rather constant length, typically between 20 and 25 amino acids, and we have previously shown that their extremities are preferentially occupied by MIR. Introduction of rules derived from this fragment analysis of tertiary structures allows to smooth the distribution of MIR, for a better match between TEF ends and MIR. In order to assess this prediction of the folding core, a large family of structures has been used, with sequences as different as possible. A dataset of 56 immunoglobulin structures of various functions but common fold has been used in this study. This fold was chosen because it is one of the most populated with a large amount of data available on its nucleus. In the immunoglobulin domain, "functional and structural load is clearly separated: loops are responsible for binding and recognition while interactions between several residues of the buried core provide stability and fast folding"[1]. We then determined the positions susceptible of high importance for the folding process to occur and compared them to published data, either to High Throw Out Order (HTOO), Conservatism of Conservatism (CoC) or Phi value experiments. It results a reasonable agreement between the positions that we predict and experimental data. Besides, our prediction goes beyond the simple use of a null solvent accessibility of amino acids as a criterion to predict the core. We find the same quality of our prediction on the flavodoxin like superfamily.
我们提出了一种算法,可用于预测球状蛋白结构形成中重要的残基。该算法依赖于一种模拟,该模拟可检测在折叠过程早期阶段具有最多数量相邻残基的氨基酸。这些氨基酸被称为 MIR(Most Interacting Residues,最相互作用残基)。此外,对具有封闭末端的蛋白质结构片段的描述表明这些末端与球蛋白核心之间存在相关性。这些片段的长度相对恒定,通常在 20 到 25 个氨基酸之间,我们之前已经表明,这些片段的末端优先由 MIR 占据。从三级结构的片段分析中得出的规则的引入可以平滑 MIR 的分布,以更好地匹配 TEF 末端和 MIR。为了评估折叠核心的这种预测,我们使用了一个具有不同序列的大型结构家族。在这项研究中,使用了具有各种功能但常见折叠的 56 个免疫球蛋白结构的数据集。之所以选择这种折叠,是因为它是最常见的折叠之一,其核心有大量数据可用。在免疫球蛋白结构域中,“功能和结构负荷明显分离:环负责结合和识别,而埋藏核心的几个残基之间的相互作用提供稳定性和快速折叠”[1]。然后,我们确定了折叠过程中可能具有高度重要性的位置,并将其与已发表的数据进行比较,无论是与 High Throw Out Order(HTOO)、Conservatism of Conservatism(CoC)还是 Phi 值实验相比。我们的预测与实验数据之间存在合理的一致性。此外,我们的预测不仅仅是简单地使用氨基酸的零溶剂可及性作为预测核心的标准。我们在黄素蛋白样超家族中也发现了我们预测的相同质量。