通过消除平均伪影来改善共识结构。

Improving consensus structure by eliminating averaging artifacts.

作者信息

Dukka B K C

机构信息

Department of Bioinformatics and Genomics, University of North Carolina at Charlotte, Charlotte, NC, USA.

出版信息

BMC Struct Biol. 2009 Mar 6;9:12. doi: 10.1186/1472-6807-9-12.

DOI:10.1186/1472-6807-9-12

PMID:19267905

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC2662860/

Abstract

BACKGROUND

Common structural biology methods (i.e., NMR and molecular dynamics) often produce ensembles of molecular structures. Consequently, averaging of 3D coordinates of molecular structures (proteins and RNA) is a frequent approach to obtain a consensus structure that is representative of the ensemble. However, when the structures are averaged, artifacts can result in unrealistic local geometries, including unphysical bond lengths and angles.

RESULTS

Herein, we describe a method to derive representative structures while limiting the number of artifacts. Our approach is based on a Monte Carlo simulation technique that drives a starting structure (an extended or a 'close-by' structure) towards the 'averaged structure' using a harmonic pseudo energy function. To assess the performance of the algorithm, we applied our approach to C alpha models of 1364 proteins generated by the TASSER structure prediction algorithm. The average RMSD of the refined model from the native structure for the set becomes worse by a mere 0.08 A compared to the average RMSD of the averaged structures from the native structure (3.28 A for refined structures and 3.36 A for the averaged structures). However, the percentage of atoms involved in clashes is greatly reduced (from 63% to 1%); in fact, the majority of the refined proteins had zero clashes. Moreover, a small number (38) of refined structures resulted in lower RMSD to the native protein versus the averaged structure. Finally, compared to PULCHRA 1, our approach produces representative structure of similar RMSD quality, but with much fewer clashes.

CONCLUSION

The benchmarking results demonstrate that our approach for removing averaging artifacts can be very beneficial for the structural biology community. Furthermore, the same approach can be applied to almost any problem where averaging of 3D coordinates is performed. Namely, structure averaging is also commonly performed in RNA secondary prediction 2, which could also benefit from our approach.

摘要

背景

常见的结构生物学方法（即核磁共振和分子动力学）通常会生成分子结构的集合。因此，对分子结构（蛋白质和RNA）的三维坐标进行平均是获得代表该集合的共识结构的常用方法。然而，当对结构进行平均时，伪影可能会导致不现实的局部几何形状，包括不符合物理规律的键长和键角。

结果

在此，我们描述了一种在限制伪影数量的同时推导代表性结构的方法。我们的方法基于蒙特卡罗模拟技术，该技术使用谐波伪能量函数将起始结构（伸展结构或“相近”结构）驱动至“平均结构”。为了评估该算法的性能，我们将我们的方法应用于由TASSER结构预测算法生成的1364种蛋白质的Cα模型。与从天然结构得到的平均结构的平均均方根偏差（平均结构为3.36 Å，优化结构为3.28 Å）相比，该集合中优化模型与天然结构之间的平均均方根偏差仅增加了0.08 Å。然而，涉及冲突的原子百分比大幅降低（从63%降至1%）；实际上，大多数优化后的蛋白质没有冲突。此外，少数（38个）优化结构与天然蛋白质相比，相对于平均结构具有更低的均方根偏差。最后，与PULCHRA 1相比，我们的方法产生的代表性结构具有相似的均方根偏差质量，但冲突要少得多。