Nanda Vikas, DeGrado William F
Department of Biochemistry and Molecular Biophysics, University of Pennsylvania School of Medicine, Philadelphia, Pennsylvania 19104, USA.
Proteins. 2005 May 15;59(3):454-66. doi: 10.1002/prot.20382.
In the absence of experimental structural determination, numerous methods are available to indirectly predict or probe the structure of a target molecule. Genetic modification of a protein sequence is a powerful tool for identifying key residues involved in binding reactions or protein stability. Mutagenesis data is usually incorporated into the modeling process either through manual inspection of model compatibility with empirical data, or through the generation of geometric constraints linking sensitive residues to a binding interface. We present an approach derived from statistical studies of lattice models for introducing mutation information directly into the fitness score. The approach takes into account the phenotype of mutation (neutral or disruptive) and calculates the energy for a given structure over an ensemble of sequences. The structure prediction procedure searches for the optimal conformation where neutral sequences either have no impact or improve stability and disruptive sequences reduce stability relative to wild type. We examine three types of sequence ensembles: information from saturation mutagenesis, scanning mutagenesis, and homologous proteins. Incorporating multiple sequences into a statistical ensemble serves to energetically separate the native state and misfolded structures. As a result, the prediction of structure with a poor force field is sufficiently enhanced by mutational information to improve accuracy. Furthermore, by separating misfolded conformations from the target score, the ensemble energy serves to speed up conformational search algorithms such as Monte Carlo-based methods.
在缺乏实验性结构测定的情况下,有许多方法可用于间接预测或探测目标分子的结构。蛋白质序列的基因修饰是识别参与结合反应或蛋白质稳定性的关键残基的有力工具。诱变数据通常通过人工检查模型与经验数据的兼容性,或通过生成将敏感残基与结合界面联系起来的几何约束,纳入建模过程。我们提出了一种源自晶格模型统计研究的方法,可将突变信息直接引入适应度评分。该方法考虑了突变的表型(中性或破坏性),并在一组序列上计算给定结构的能量。结构预测程序搜索最优构象,其中中性序列要么没有影响,要么提高稳定性,而破坏性序列相对于野生型降低稳定性。我们研究了三种类型的序列集:来自饱和诱变、扫描诱变和同源蛋白的信息。将多个序列纳入统计集有助于在能量上区分天然状态和错误折叠的结构。结果,通过突变信息充分增强了具有较差力场的结构预测,以提高准确性。此外,通过将错误折叠的构象与目标评分分开,集总能量有助于加速构象搜索算法,如基于蒙特卡罗的方法。