Bonvin A M, Brünger A T
Howard Hughes Medical Institute, Yale University, New Haven, CT 06520, USA.
J Mol Biol. 1995 Jun 30;250(1):80-93. doi: 10.1006/jmbi.1995.0360.
In structure determination by X-ray crystallography and solution NMR spectroscopy, experimental data are collected as time and ensemble-averages. Thus, in principle, appropriate time and ensemble-averaged models should be used. Refinement of an ensemble of conformers rather than one single structure against the experimental NMR data could, however, result in overfitting the data because of the significantly increased number of parameters. To avoid overfitting, complete cross-validation, which provides an unbiased measure of the fit, has been applied to nuclear Overhauser effect derived distance refinement. Using two synthetic test cases, a correlation was demonstrated between the cross-validated measure to the fit (defined in terms of root-mean-square deviations from the distance restraints and number of violations) and the number of models that best reproduce the conformational variability in solution. A new method, based on a probability map, has been used to generate good representations of the resulting ensembles of structures. The method has also been applied to observed NMR data for two proteins, interleukin 4 and interleukin 8. For interleukin 4, cross-validation indicates that a single-conformer model gives the most accurate representation of the structure, whereas conventional measures of fit between the experimental data and those calculated from the model decrease when increasing the number of conformers, indicating overfitting. For interleukin 8, complete cross-validation predicts a twin-conformer model to be the most faithful representation of the experimental data. Two distinct conformations for the loop formed by residues 16 to 22 emerge from the family of twin-conformer structures. The putative alternate conformation of the loop is not observed in the crystal structure of interleukin 8. However, because of crystal packing contacts in this region this does not necessarily exclude the presence of the alternate conformation in solution. The twin-conformer model is supported by observed chemical exchange line broadening for the amide of His18 obtained by 15N relaxation studies. This region has also been implied to be involved in receptor binding.
在通过X射线晶体学和溶液核磁共振波谱法进行结构测定时,实验数据是作为时间和系综平均值收集的。因此,原则上应使用适当的时间和系综平均模型。然而,针对实验核磁共振数据对一组构象异构体而非单一结构进行精修,可能会因参数数量显著增加而导致数据过度拟合。为避免过度拟合,已将提供拟合无偏测量的完全交叉验证应用于核Overhauser效应衍生的距离精修。通过两个合成测试案例,证明了交叉验证测量与拟合度(根据与距离约束的均方根偏差和违规数量定义)之间的相关性,以及能最佳再现溶液中构象变异性的模型数量之间的相关性。一种基于概率图的新方法已被用于生成所得结构系综的良好表示。该方法也已应用于两种蛋白质(白细胞介素4和白细胞介素8)的观测核磁共振数据。对于白细胞介素4,交叉验证表明单构象体模型能最准确地表示结构,而随着构象体数量增加,实验数据与模型计算数据之间的传统拟合度测量值会降低,表明存在过度拟合。对于白细胞介素8,完全交叉验证预测双构象体模型是实验数据最可靠的表示。由16至22位残基形成的环的两种不同构象出现在双构象体结构家族中。在白细胞介素8的晶体结构中未观察到该环的假定交替构象。然而,由于该区域的晶体堆积接触,这并不一定排除溶液中存在交替构象。通过15N弛豫研究获得的His18酰胺的观测化学交换线展宽支持了双构象体模型。该区域也被认为与受体结合有关。