Snyder David A, Grullon Jennifer, Huang Yuanpeng J, Tejero Roberto, Montelione Gaetano T
Department of Chemistry, William Paterson University, Wayne, New Jersey, 07470.
Proteins. 2014 Feb;82 Suppl 2(0 2):219-30. doi: 10.1002/prot.24490.
Maximizing the scientific impact of NMR-based structure determination requires robust and statistically sound methods for assessing the precision of NMR-derived structures. In particular, a method to define a core atom set for calculating superimpositions and validating structure predictions is critical to the use of NMR-derived structures as targets in the CASP competition. FindCore (Snyder and Montelione, Proteins 2005;59:673-686) is a superimposition independent method for identifying a core atom set and partitioning that set into domains. However, as FindCore optimizes superimposition by sensitively excluding not-well-defined atoms, the FindCore core may not comprise all atoms suitable for use in certain applications of NMR structures, including the CASP assessment process. Adapting the FindCore approach to assess predicted models against experimental NMR structures in CASP10 required modification of the FindCore method. This paper describes conventions and a standard protocol to calculate an "Expanded FindCore" atom set suitable for validation and application in biological and biophysical contexts. A key application of the Expanded FindCore method is to identify a core set of atoms in the experimental NMR structure for which it makes sense to validate predicted protein structure models. We demonstrate the application of this Expanded FindCore method in characterizing well-defined regions of 18 NMR-derived CASP10 target structures. The Expanded FindCore protocol defines "expanded core atom sets" that match an expert's intuition of which parts of the structure are sufficiently well defined to use in assessing CASP model predictions. We also illustrate the impact of this analysis on the CASP GDT assessment scores.
要使基于核磁共振(NMR)的结构测定产生最大的科学影响,需要采用稳健且具有统计学依据的方法来评估NMR衍生结构的精度。特别是,定义一个用于计算叠加和验证结构预测的核心原子集的方法,对于将NMR衍生结构用作蛋白质结构预测技术关键评估(CASP)竞赛中的目标至关重要。FindCore(Snyder和Montelione,《蛋白质》2005年;59:673 - 686)是一种独立于叠加的方法,用于识别核心原子集并将该集划分为不同结构域。然而,由于FindCore通过敏感地排除定义不明确的原子来优化叠加,FindCore核心可能不包含适用于NMR结构某些应用(包括CASP评估过程)的所有原子。在CASP10中,为了使FindCore方法适用于根据实验NMR结构评估预测模型,需要对FindCore方法进行修改。本文描述了一些惯例和标准协议,以计算适用于生物学和生物物理背景下验证和应用的“扩展FindCore”原子集。扩展FindCore方法的一个关键应用是在实验NMR结构中识别一组核心原子,用其来验证预测的蛋白质结构模型是有意义的。我们展示了这种扩展FindCore方法在表征18个NMR衍生的CASP10目标结构的明确区域中的应用。扩展FindCore协议定义了“扩展核心原子集”,这些原子集与专家对于结构中哪些部分定义得足够明确以用于评估CASP模型预测的直觉相匹配。我们还说明了这种分析对CASP全局距离测试(GDT)评估分数的影响。