Brunner Konrad, Gronwald Wolfram, Trenner Jochen M, Neidig Klaus-Peter, Kalbitzer Hans Robert
Department of Biophysics and Physical Biochemistry, University of Regensburg, Postfach, D-93040 Regensburg, Federal Republic of Germany.
BMC Struct Biol. 2006 Jun 26;6:14. doi: 10.1186/1472-6807-6-14.
Rapid and accurate three-dimensional structure determination of biological macromolecules is mandatory to keep up with the vast progress made in the identification of primary sequence information. During the last few years the amount of data deposited in the protein data bank has substantially increased providing additional information for novel structure determination projects. The key question is how to combine the available database information with the experimental data of the current project ensuring that only relevant information is used and a correct structural bias is produced. For this purpose a novel fully automated algorithm based on Bayesian reasoning has been developed. It allows the combination of structural information from different sources in a consistent way to obtain high quality structures with a limited set of experimental data. The new ISIC (Intelligent Structural Information Combination) algorithm is part of the larger AUREMOL software package.
Our new approach was successfully tested on the improvement of the solution NMR structures of the Ras-binding domain of Byr2 from Schizosaccharomyces pombe, the Ras-binding domain of RalGDS from human calculated from a limited set of NMR data, and the immunoglobulin binding domain from protein G from Streptococcus by their corresponding X-ray structures. In all test cases clearly improved structures were obtained. The largest danger in using data from other sources is a possible bias towards the added structure. In the worst case instead of a refined target structure the structure from the additional source is essentially reproduced. We could clearly show that the ISIC algorithm treats these difficulties properly.
In summary, we present a novel fully automated method to combine strongly coupled knowledge from different sources. The combination with validation tools such as the calculation of NMR R-factors strengthens the impact of the method considerably since the improvement of the structures can be assessed quantitatively. The ISIC method can be applied to a large number of similar problems where the quality of the obtained three-dimensional structures is limited by the available experimental data like the improvement of large NMR structures calculated from sparse experimental data or the refinement of low resolution X-ray structures. Also structures may be refined using other available structural information such as homology models.
为跟上在生物大分子一级序列信息识别方面取得的巨大进展,快速准确地确定生物大分子的三维结构至关重要。在过去几年中,蛋白质数据库中存储的数据量大幅增加,为新的结构测定项目提供了更多信息。关键问题是如何将可用的数据库信息与当前项目的实验数据相结合,确保仅使用相关信息并产生正确的结构偏差。为此,开发了一种基于贝叶斯推理的新型全自动算法。它允许以一致的方式组合来自不同来源的结构信息,从而在有限的实验数据集的情况下获得高质量的结构。新的ISIC(智能结构信息组合)算法是更大的AUREMOL软件包的一部分。
我们的新方法成功地在以下方面进行了测试:改进来自粟酒裂殖酵母的Byr2的Ras结合结构域的溶液NMR结构、根据有限的NMR数据集计算的来自人类的RalGDS的Ras结合结构域,以及来自链球菌的蛋白G的免疫球蛋白结合结构域的相应X射线结构。在所有测试案例中,都获得了明显改进的结构。使用其他来源数据的最大风险是可能偏向于添加的结构。在最坏的情况下,不是得到一个优化的目标结构,而是基本上重现了来自其他来源的结构。我们可以清楚地表明,ISIC算法能够妥善处理这些困难。
总之,我们提出了一种新颖的全自动方法,用于组合来自不同来源的强耦合知识。与诸如NMR R因子计算等验证工具相结合,大大增强了该方法的影响力,因为可以定量评估结构的改进。ISIC方法可应用于大量类似问题,其中获得的三维结构的质量受到可用实验数据的限制,例如从稀疏实验数据计算得到的大型NMR结构的改进或低分辨率X射线结构的优化。此外,也可以使用其他可用的结构信息(如同源模型)来优化结构。