Khoo Y, Singer A, Cowburn D
Department of Physics, Princeton University, Princeton, NJ, 08540, USA.
Department of Mathematics, Stanford University, Stanford, CA, 94305, USA.
J Biomol NMR. 2017 Jul;68(3):163-185. doi: 10.1007/s10858-017-0108-7. Epub 2017 Jun 14.
We revisit the problem of protein structure determination from geometrical restraints from NMR, using convex optimization. It is well-known that the NP-hard distance geometry problem of determining atomic positions from pairwise distance restraints can be relaxed into a convex semidefinite program (SDP). However, often the NOE distance restraints are too imprecise and sparse for accurate structure determination. Residual dipolar coupling (RDC) measurements provide additional geometric information on the angles between atom-pair directions and axes of the principal-axis-frame. The optimization problem involving RDC is highly non-convex and requires a good initialization even within the simulated annealing framework. In this paper, we model the protein backbone as an articulated structure composed of rigid units. Determining the rotation of each rigid unit gives the full protein structure. We propose solving the non-convex optimization problems using the sum-of-squares (SOS) hierarchy, a hierarchy of convex relaxations with increasing complexity and approximation power. Unlike classical global optimization approaches, SOS optimization returns a certificate of optimality if the global optimum is found. Based on the SOS method, we proposed two algorithms-RDC-SOS and RDC-NOE-SOS, that have polynomial time complexity in the number of amino-acid residues and run efficiently on a standard desktop. In many instances, the proposed methods exactly recover the solution to the original non-convex optimization problem. To the best of our knowledge this is the first time SOS relaxation is introduced to solve non-convex optimization problems in structural biology. We further introduce a statistical tool, the Cramér-Rao bound (CRB), to provide an information theoretic bound on the highest resolution one can hope to achieve when determining protein structure from noisy measurements using any unbiased estimator. Our simulation results show that when the RDC measurements are corrupted by Gaussian noise of realistic variance, both SOS based algorithms attain the CRB. We successfully apply our method in a divide-and-conquer fashion to determine the structure of ubiquitin from experimental NOE and RDC measurements obtained in two alignment media, achieving more accurate and faster reconstructions compared to the current state of the art.
我们利用凸优化方法重新审视了基于核磁共振几何约束确定蛋白质结构的问题。众所周知,从成对距离约束确定原子位置的NP难距离几何问题可以松弛为一个凸半定规划(SDP)。然而,通常情况下,NOE距离约束对于精确的结构确定来说过于不精确和稀疏。剩余偶极耦合(RDC)测量提供了关于原子对方向与主轴框架轴之间角度的额外几何信息。涉及RDC的优化问题是高度非凸的,即使在模拟退火框架内也需要良好的初始化。在本文中,我们将蛋白质主链建模为由刚性单元组成的关节结构。确定每个刚性单元的旋转即可得到完整的蛋白质结构。我们提出使用平方和(SOS)层次结构来解决非凸优化问题,这是一种具有不断增加的复杂度和逼近能力的凸松弛层次结构。与经典的全局优化方法不同,如果找到全局最优解,SOS优化会返回最优性证书。基于SOS方法,我们提出了两种算法——RDC-SOS和RDC-NOE-SOS,它们在氨基酸残基数量上具有多项式时间复杂度,并且在标准桌面上能高效运行。在许多情况下,所提出的方法能精确恢复原始非凸优化问题的解。据我们所知,这是首次将SOS松弛引入到解决结构生物学中的非凸优化问题。我们进一步引入了一种统计工具——克拉美-罗界(CRB),以提供一个信息理论界,说明在使用任何无偏估计器从噪声测量中确定蛋白质结构时能够期望达到的最高分辨率。我们的模拟结果表明,当RDC测量被具有实际方差的高斯噪声破坏时,两种基于SOS的算法都能达到CRB。我们成功地以分治方式应用我们的方法,根据在两种对齐介质中获得的实验NOE和RDC测量来确定泛素的结构,与当前的技术水平相比,实现了更准确、更快的重建。