用于改进蛋白质结构测定推断的概率集成方法。

Probabilistic ensembles for improved inference in protein-structure determination.

作者信息

Soni Ameet, Shavlik Jude

机构信息

Department of Computer Sciences, Department of Biostatistics and Medical Informatics, University of Wisconsin-Madison, Madison, WI 53706, USA.

出版信息

J Bioinform Comput Biol. 2012 Feb;10(1):1240009. doi: 10.1142/S0219720012400094.

DOI:10.1142/S0219720012400094

PMID:22809310

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC3401969/

Abstract

Protein X-ray crystallography--the most popular method for determining protein structures--remains a laborious process requiring a great deal of manual crystallographer effort to interpret low-quality protein images. Automating this process is critical in creating a high-throughput protein-structure determination pipeline. Previously, our group developed ACMI, a probabilistic framework for producing protein-structure models from electron-density maps produced via X-ray crystallography. ACMI uses a Markov Random Field to model the three-dimensional (3D) location of each non-hydrogen atom in a protein. Calculating the best structure in this model is intractable, so ACMI uses approximate inference methods to estimate the optimal structure. While previous results have shown ACMI to be the state-of-the-art method on this task, its approximate inference algorithm remains computationally expensive and susceptible to errors. In this work, we develop Probabilistic Ensembles in ACMI (PEA), a framework for leveraging multiple, independent runs of approximate inference to produce estimates of protein structures. Our results show statistically significant improvements in the accuracy of inference resulting in more complete and accurate protein structures. In addition, PEA provides a general framework for advanced approximate inference methods in complex problem domains.

摘要

蛋白质X射线晶体学——确定蛋白质结构最常用的方法——仍然是一个费力的过程，需要晶体学家付出大量人力来解读低质量的蛋白质图像。实现这一过程的自动化对于构建高通量蛋白质结构测定流程至关重要。此前，我们团队开发了ACMI，这是一个概率框架，用于根据X射线晶体学产生的电子密度图生成蛋白质结构模型。ACMI使用马尔可夫随机场对蛋白质中每个非氢原子的三维（3D）位置进行建模。计算该模型中的最佳结构是难以处理的，因此ACMI使用近似推理方法来估计最优结构。虽然之前的结果表明ACMI是这项任务中的最先进方法，但其近似推理算法在计算上仍然成本高昂且容易出错。在这项工作中，我们开发了ACMI中的概率集成（PEA），这是一个利用多次独立的近似推理运行来生成蛋白质结构估计的框架。我们的结果表明，推理准确性有了统计学上的显著提高，从而得到更完整、准确的蛋白质结构。此外，PEA为复杂问题领域中的高级近似推理方法提供了一个通用框架。