Onuk A Emre, Akcakaya Murat, Bardhan Jaydeep P, Erdogmus Deniz, Brooks Dana H, Makowski Lee
Electrical and Computer Engineering Department, Northeastern University, Boston, MA.
Electrical and Computer Engineering Department, University of Pittsburgh, Pittsburgh, PA.
IEEE Trans Signal Process. 2015 Oct 15;63(20):5383-5394. doi: 10.1109/TSP.2015.2455515. Epub 2015 Jul 13.
In this paper, we describe a model for maximum likelihood estimation (MLE) of the relative abundances of different conformations of a protein in a heterogeneous mixture from small angle X-ray scattering (SAXS) intensities. To consider cases where the solution includes intermediate or unknown conformations, we develop a subset selection method based on k-means clustering and the Cramér-Rao bound on the mixture coefficient estimation error to find a sparse basis set that represents the space spanned by the measured SAXS intensities of the known conformations of a protein. Then, using the selected basis set and the assumptions on the model for the intensity measurements, we show that the MLE model can be expressed as a constrained convex optimization problem. Employing the adenylate kinase (ADK) protein and its known conformations as an example, and using Monte Carlo simulations, we demonstrate the performance of the proposed estimation scheme. Here, although we use 45 crystallographically determined experimental structures and we could generate many more using, for instance, molecular dynamics calculations, the clustering technique indicates that the data cannot support the determination of relative abundances for more than 5 conformations. The estimation of this maximum number of conformations is intrinsic to the methodology we have used here.
在本文中,我们描述了一种用于从小角X射线散射(SAXS)强度估计异质混合物中蛋白质不同构象相对丰度的最大似然估计(MLE)模型。为了考虑溶液包含中间或未知构象的情况,我们基于k均值聚类和混合物系数估计误差的克拉美罗界开发了一种子集选择方法,以找到一个稀疏基集,该基集表示由蛋白质已知构象的测量SAXS强度所跨越的空间。然后,使用选定的基集以及强度测量模型的假设,我们表明MLE模型可以表示为一个约束凸优化问题。以腺苷酸激酶(ADK)蛋白及其已知构象为例,并使用蒙特卡罗模拟,我们展示了所提出估计方案的性能。在这里,尽管我们使用了45个晶体学确定的实验结构,并且我们可以使用例如分子动力学计算生成更多结构,但聚类技术表明,数据无法支持确定超过5种构象的相对丰度。这种最大构象数的估计是我们在此使用的方法所固有的。